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This is a FIRST submission of items concerning a filing under 35 USC 371 . 

This is a SECOND or SUBSEQUENT submission of items concerning a filing under 35 USC 
371. 

This express request to begin national examination procedures (35 USC 371(f)) at any time rather 
than delay examination until the expiration of the applicable time limit set in 35 USC 371(b) and 
PCT Articles 22 and 39(1). 

4. (X) A proper Demand for International Preliminary Examination was made by the 1 9th month from the 

earliest claimed priority date. 

5. (X) A copy of the International Application as filed (35 USC 37 1 (c)(2)) 

a) ( ) is transmitted herewith (required only if not transmitted by the International Bureau). 

b) (X) has been transmitted by the International Bureau. 

c) ( ) is not required, as the application was filed in the United States Receiving Office 

(RO/US). 

6. ( ) A translation of the International Application into English (35 USC 371(c)(2)). 

7. (X) Amendments to the claims of the International Application under PCT Article 19 (35 USC 
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c ) ( ) have not been made; however, the time limit for making such amendments has NOT 
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8. ( ) A translation of the amendments to the claims under PCT Article 19 (35 USC 371(c)(3)). 

9. ( ) An oath or declaration of the inventor(s) (35 USC 371(c)(4)). 

10. (X) A copy of the International Preliminary Examination Report with any annexes thereto, such as any 

amendments made under PCT Article 34. 

1 1 • ( ) A translation of the annexes, such as any amendments made under PCT Article 34, to the 
International Preliminary Examination Report under PCT Article 36 (35 USC 371(c)(5)). 
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Items 12. to 23. below concern other documents) or information included: 

12. ( ) An Information Disclosure Statement under 37 CFR 1 .97 and 1 .98. 

13. ( ) An assignment document for recording. A separate cover sheet in compliance with 37 CFR 3.28 

and 3.31 is included. 

14. (X) A FIRST preliminary amendment. 

( ) A SECOND or SUBSEQUENT preliminary amendment, 

15. ( ) A substitute specification. 

16. ( ) A power of attorney and/or address letter. 

17. (X) International Application as published. 

18. ( ) Small Entity Statement. 

1 9. (X) Sequence Submission Statement in 1 page. 

20. (X) Paper Copy of the Sequence Listing in 13 pages. 

2 1 . (X) Sequence Listing in Computer Readable Format. 

22. (X) A return prepaid postcard. 

23. (X) The following fees are submitted: 

FEES 



BASIC FEE 

$860.00 

CLAIMS 

NUMBER 
FILED 

NUMBER RATE 
EXTRA 


Total Claims 

16 = 

0 x $18 

$0 

Independent Claims 

6-3 = 

3 x $80 

$240.00 

Multiple dependent claims(s) (if applicable) 

$270 

$0 

TOTAL OF ABOVE CALCULATIONS $1,1 00.00 

Reduction by 1/2 for filing by small entity (if applicable). Verified Small Entity $0 
statement must also be filed. (NOTE 37 CFR 1 .9, 1 .27, 1 .28) 


TOTAL NATIONAL FEE 

$1,100.00 


TOTAL FEES ENCLOSED 

$1,172.00 



Please credit deposit account 501 181: 
amount to be charged: 

$72,000 
$0 


a. (X) A check in the amount of $1 172.00 to cover the above fees is enclosed. 
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c. (X) The Commissioner is hereby authorized to charge only those additional fees which may be 

required, now or in the future, to avoid abandonment of the application, or credit any overpayment 
to Deposit Account No. 501 181 . A duplicate copy of this sheet is enclosed. 

NOTE: Where an appropriate time limit under 37 CFR 1.494 or 1.495 has not been met, a petition to revive 
(37 CFR 1.137(a) or (b)) must be filed and granted to restore the application to pending status. 
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204, 257, 295 of SEQ ID No 4, a Cys residue at position 205 of SEQ ID No 4 a and a Pro residue 
at position 225 of SEQ ID No 4 [, and a Glu residue at position 252 of SEQ ID No 4]. 

40. A method for the screening of a candidate substance or molecule modulating the expression 
of the hGGPS gene, said method comprising the following steps : 

(a) providing a recombinant host cell expressing a nucleic acid, wherein said nucleic acid 
comprises a polynucleotide according t o claim 1 [a nucleotide sequence selected from the group 
consisting of SEQ ID Nos 1, 2 and 3 or a fragment thereof]; 

(b) obtaining a candidate substance, and 

(c) determining the ability of the candidate substance to modulate the expression levels of the 
nucleotide sequence according to claim 1 [selected from the group consisting of SEQ ID Nos 1, 
2, and 3 or a fragment thereof]. 

41 . A method for the screening of a candidate substance or molecule modulating the expression 
of the hGGPS gene, said method comprising the following steps : 

(a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises a nucleotide sequence of the 5' regulatory region or a biologically active fragment of a 
polynucleotide according to claim 1 or variant thereof located upstream of a polynucleotide 
encoding a detectable protein; 

(b) obtaining a candidate substance; and 

(c) determining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 

REMARKS 

Amendments to the claims: 

The sequence listing has been amended to list the Inventors as the Applicants according 
to U.S. procedure rather than listing the Assignee as Applicant. Accordingly, no new matter has 
been introduced. 
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Amendments to the claims: 

Claims 2 to 9, 11 to 14, 16 to 20, 22, 23, 29 to 31 and 33 to 37 have been canceled 
without prejudice. However, Applicants reserve the right to pursue the subject matter of the 
withdrawn claims in related applications. New Claim 42 has been added. New Claim 42 relates 
to a method of making the hGGPPS polypeptides of the present application. Support for new 
Claim 42 is found throughout the specification. In particular, support for Claim 42 is found in 
the specification at pages 44 to 45 and elsewhere in the specification. 

Accordingly, Claims 1 to 10, 15, 21, 24 to 28, 32 and 38 to 42 are pending in this 


Please charge any additional fees or credit overpayment to Deposit Account No. 50-1 1 8 1 . 


application. 


Respectfully submitted, 




John Ltfcas, Ph.D., J.D. 
Registration No. 43,373 


Genset Corporation 

875 Prospect Street, Suite 206 

La Jolla, CA 92037 
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A nucleic acid encoding a gcranyl-geranyl pyrophospj^ate^njfaetase (GGPPS) and 
polymorphic markers associated with sai d nucleic acid. 

FIELD OF THE INVENTION 

The present invention relates to a purified or isolated polynucleotide encoding human 
5 geranylgeranyl pyrophosphate synthetase, the regulatory nucleic acids contained therein, a 
polymorphic marker thereof and the resulting encoded protein, as well as to methods and kits for 
detecting this polynucleotide and this protein. The present invention also pertains to a polynucleotide 
carrying the natural regulatory regions of the hGGPS gene which is useful, for example, to express a 
heterologous nucleic acid in host cells or host organisms as well as functionally active regulatory 
10 polynucleotides derived from said regulatory region. The invention also consists in genetic markers, 
namely biallelic markers, which may be useful for the diagnosis of diseases related to an alteration 
in the regulatory or coding regions of hGGPS, such as pathologies related to a defect in the 
mevalonic biosynthetic pathway. 

BACKGROUND OF THE INVENTION 

1 5 Prenylation is the least common known lipid modification. Other lipid modifications include 

palmitylation, rnyristylation and glycophospholipidation. However, prenylation is a surprisingly 
common form of post-translational protein modification with an occurrence of 0.5 % of all cellular 
proteins. Prenylation is a covalent modification which involves the attachment of either a C15 
farnesyl or a C20 geranylgeranyl isoprenoid, both being products of the mevalonic acid biosynthetic 

20 pathway, to one or more cysteine residues at the caTboxyl terminus of the protein via a thioether 
bond. The C20 geranylgeranyl modification predominates over the CI 5 farnesyl modification in 
terms of frequency of occurrence. The structural environment of the cysteine residue determines the 
specific type and number of isoprenoid groups that attach to each cysteine. The covalent 
modification resulting from prenylation renders proteins more hydrophobic and, together with a 

25 subsequent modification cascade, facilitates their association with membranes. Protein prenylation 
also mediates protein-protein interactions. Prenylated proteins can be involved in signal 
transduction, intracellular vesicular transport, cytoskeletal organization, cell growth control and 
polarity, viral replication and protein folding/assembly. In mammals, prenylated proteins are more 
frequently modified by one or more geranylgeranyl groups. Farnesylation has only been found to 

30 occur in the retinal heterotrimeric G protein transducin, in retinal rhodopsin kinase, in ras proteins, 
in nuclear lamins, and in yeast mating factors. Geranylgeranylation is found in all of the remaining 
heterotrimeric G proteins and small G proteins. 

Heterotrimeric G-proteins which are required for intracellular signal transduction between 
receptors and effector enzymes present one or two prenylated subunits. This modification is often 
35 required for association of the functional complex with the membrane. 
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Among small G proteins, Ras proteins, which comprise oncogenic forms, regulate signal 
transduction pathways controlling cell proliferation and differentiation. All ras proteins are 
prenylated and this modification is critical for their transport to the inner surface of the plasma 
membrane and their biological functions. 
5 Other prenylated proteins belonging to the ras protein superfamily are involved in the 

regulation of intracellular vesicular transport (Rab/YPTl), in the cytoskeletal organization of 
polymerized actin to produce stress fibers (Rho) or membrane ruffling (Rac), in the oxydative burst 
of phagocytic cells (Rac), in the control of the cell cycle and polarity (cdc24Hs/G25K), and in 
negative growth control (Rap/Krev-1). Prenylation is important to these activities. For example, 
10 Rab/YPT prenylation is critical for the association of these proteins with specific intracellular 
compartments and in their regulation of intracellular transport processes. 

One hypothesis is that rather than providing only an increase in hydrophobicity, the 
isoprenoid acts as part of a recognition unit for specific receptors that interact with either 
farnesylated or geranylgeranylated proteins. The recent observations that geranylgeranyl-modified 
15 forms of K-Ras4B or H-Ras proteins exhibit intracellular localizations which are different from 
those of their authentic farnesylated counterparts is consistent with this possibility. 

Moreover, prenylation of nuclear lamins, which are involved in the mitotic control of 
membrane assembly, is necessary for the proper assembly of these proteins into the nuclear lamina. 
Indeed, prenylation is necessary to the maturation by cleavage of prelamin A in lamin A and to 
20 obtain functional lamin B. 

Geranylgeranyl pyrophosphate synthetase (GGPS) is involved in the mevalonic acid 
biosynthetic pathway and is located in the cytosol. It catalyzes the consecutive condensation of 
isopentenyl diphosphate with allylic diphosphates to produce GGPP. This biosynthesis of GGPPS is 
regulated according to requirements for protein prenylation. GGPS has been found to be expressed 
25 in human fetal heart, as described in the PCT Application No WO 96/21736. 

SUMMARY OF THE INVENTION 

The present invention pertains to nucleic acid molecules comprising the genomic sequence 
of a novel human gene which encodes a hGGPPS protein. The hGGPPS genomic sequence 
comprises regulatory sequence located upstream (5* -end) and downstream (3 '-end) of the 
30 transcribed portion of said gene, these regulatory sequences being also part of the invention. 

The invention also deals with the complete sequence of two cDNAs encoding the hGGPPS 
protein, as well as with the corresponding translation product. 

Oligonucleotide probes or primers hybridizing specifically with a hGGPPS genomic or 
cDNA sequences are also part of the present invention, as well as DNA amplification and detection 
35 methods using said primers and probes. 

A further object of the invention consists of recombinant vectors comprising any of the 
nucleic acid sequences described above, and in particular of recombinant vectors comprising a 
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hGGPPS regulatory sequence or a sequence encoding a hGGPPS protein, as well as of cell hosts and 
transgenic non human animals comprising said nucleic acid sequences or recombinant vectors. 
The invention also concerns a hGGPPS-relaled biallelic marker. 

Finally, the invention is directed to methods for the screening of substances or molecules 
5 that modify or inhibit the expression of hGGPPS. 

BRIEF DESCRIPTION OF THE DRAWING 

Figure 1 : Map of the genomic, cDNA and coding (CDS) sequences of hGGPS : (1) upper 
line, genomic sequence; (2) cDNA sequence of SEQ ID No 2; (3) coding sequence (CDS). 

Figure 2 : Map of the genomic, cDNA and coding (CDS) sequences of hGGPS : (1) upper 
10 line, genomic sequence; (2) cDNA sequence of SEQ ID No 3; (3) coding sequence (CDS). 

Brief Description of the sequences provided in the Sequence Listing 

SEQ ID No 1 contains a genomic sequence of hGGPPS comprising the 5' regulatory region 
(upstream untranscribed region), the exons and introns, and the 3' regulatory region (downstream 
untranscribed region). 

15 SEQ 3D No 2 contains a cDNA sequence of hGGPPS comprising the exons 1, 2, 3, and 4. 

SEQ ID No 3 contains a cDNA sequence of hGGPPS comprising the exons Ibis, 2, 3, and 4. 
SEQ ID No 4 contains the amino acid sequence encoded by the cDNA of SEQ ID No 2 or 3. 
SEQ ID Nos 5 and 6 contain the fragments containing a polymorphic base of the biallelic 
marker 5-1 87-77. 

20 SEQ ID No 7 contains the microsequencing primer of the biallelic marker 5-187-77. 

SEQ 3D Nos 8 and 9 contain the amplification primers of the biallelic marker 5-187-77. 
SEQ ID No 10 contains a primer containing the additional PU 5' sequence described further 
in Example 3. 

SEQ ID No 11 contains a primer containing the additional RP 5' sequence described further 
25 in Example 3. 

DETAILED DESCRIPTION OF THE INVENTION 

The hGGPS gene of the invention is located on chromosome 1, and more precisely on the 
Iq42-lq43 locus of this chromosome. This chromosome 1 locus has been shown to carry a 
predisposing gene for prostate cancer (Berthon et aL, 1998). 
30 The hGGPS gene of the invention is located in the vicinity of a retinoblastoma binding 

protein gene. Indeed, the coding sequence of this latter gene is on a strand which is opposite to the 
strand carrying the hGGPS Open Reading Frame. 

The aim of the present invention is to provide polynucleotides derived from the hGGPS 
gene, particularly those useful to design suitable means for detecting the presence of this gene in a 
35 test sample or alternatively to discriminate between the hGGPS mRNA molecules that are present in 
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a test sample. Other polynucleotides of the invention are useful to design suitable means to express a 
desired polynucleotide of interest. The invention also relates to the hGGPS polypeptide having the 
amino acid sequence of SEQ ED No 4. 

Definitions 

5 Before describing the invention in greater detail, the following definitions are set forth to 

illustrate and define the meaning and scope of the terms used to describe the invention herein. 

The term " hGGPPS gene ", when used herein, encompasses rnRNA and cDNA sequences 
encoding the hGGPPS protein. In the case of a genomic sequence, the hGGPPS gene also includes 
native regulatory regions which control the expression of the coding sequence of the hGGPPS gene. 
10 The term " functionally active fragment " of the hGGPPS protein is intended to designate a 

polypeptide carrying at least one of the structural features of the hGGPPS protein involved in at least 
one of the biological functions and/or activity of the hGGPPS protein. 

A " heterologous " or " exogenous " polynucleotide designates a purified or isolated nucleic 
acid that has been placed, by genetic engineering techniques, in the environment of unrelated 
15 nucleotide sequences, such as the final polynucleotide construct does not occur naturally. An 

illustrative, but not limitative, embodiment of such a polynucleotide construct may be represented by 
a polynucleotide comprising (1) a regulatory polynucleotide derived from the hGGPPS gene 
sequence and (2) a polynucleotide encoding a cytokine, for example GM-CSF. The polypeptide 
encoded by the heterologous polynucleotide will be termed an heterologous polypeptide for the 
20 purpose of the present invention. 

By a "biologically active fragment or variant " of a regulatory polynucleotide according to 
the present invention is intended a polynucleotide comprising or alternatively consisting in a 
fragment of said polynucleotide which is functional as a regulatory region for expressing a 
recombinant polypeptide or a recombinant polynucleotide in a recombinant cell host. 
25 For the purpose of the invention, a nucleic acid or polynucleotide is " functional " as a 

regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if said 
regulatory polynucleotide contains nucleotide sequences which contain transcriptional and 
translational regulatory information, and such sequences are "operatively linked" to nucleotide 
sequences which encode the desired polypeptide or the desired polynucleotide. An operable linkage 
30 is a linkage in which the regulatory nucleic acid and the DNA sequence sought to be expressed are 
linked in such a way as to permit gene expression. 

As used herein, the term " operablv linked " refers to a linkage of polynucleotide elements in 
a functional relationship. For instance, a promoter or enhancer is operably linked to a coding 
sequence if it affects the transcription of the coding sequence. More precisely, two DNA molecules 
35 (such as a polynucleotide containing a promoter region and a polynucleotide encoding a desired 
polypeptide or polynucleotide) are said to be "operably linked" if the nature of the linkage between 
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the two polynucleotides does not (1) result in the introduction of a frame-shift mutation or (2) 
interfere with the ability of the polynucleotide containing the promoter to direct the transcription of 
the coding polynucleotide. The promoter polynucleotide would be operably linked to a 
polynucleotide encoding a desired polypeptide or a desired polynucleotide if the promoter is capable 
5 of effecting transcription of the polynucleotide of interest. 

The terms " sample " or " material sample '* are used herein to designate a solid or a liquid 
material suspected to contain a polynucleotide or a polypeptide of the invention. A solid material 
may be, for example, a tissue slice or biopsy within which is searched the presence of a 
polynucleotide encoding a hGGPPS protein, either a DNA or RNA molecule or within which is 

1 0 searched the presence of a native or a mutated hGGPPS protein, or alternatively the presence of a 
desired protein of interest the expression of which has been placed under the control of a hGGPPS 
regulatory polynucleotide. A liquid material may be, for example, any body fluid like serum, urine 
etc., or a liquid solution resulting from the extraction of nucleic acid or protein material of interest 
from a cell suspension or from cells in a tissue slice or biopsy. The term "biological sample" is also 

1 5 used and is more precisely defined within the Section dealing with DNA extraction. 

As used herein, the term " purified " does not require absolute purity; rather, it is intended as 
a relative definition. Purification if starting material or natural material to at least one order of 
magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is 
expressly contemplated. As an example, purification from 0.1% concentration to 10% concentration 

20 is two orders of magnitude. 

The term " isolated " requires that the material be removed from its original environment (e.g. 
the natural environment if it is naturally occurring). For example, a naturally-occurring 
polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide 
or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, 

25 is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide 
could be part of a composition and still be isolated in that the vector or composition is not part of its 
natural environment. 

The term " polypeptide " refers to a polymer of amino acids without regard to the length of 
the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of 

30 polypeptide. This term also does not specify or exclude post-expression modifications of 

polypeptides, for example, polypeptides which include the covalent attachment of glycosyi groups, 
acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term 
polypeptide. Also included within the definition are polypeptides which contain one or more 
analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 

35 which only occur naturally in an unrelated biological system, modified amino acids from 

mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications 
known in the art, both naturally occurring and non-naturally occurring. 
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The term " recombinant polypeptide " is used herein to refer to polypeptides that have been 
artificially designed and which comprise at least two polypeptide sequences that are not found as 
contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides 
which have been expressed from a recombinant polynucleotide. 
5 The term " purified " is used herein to describe a polypeptide of the invention which has been 

separated from other compounds including, but not limited to nucleic acids, lipids, carbohydrates 
and other proteins. A polypeptide is substantially pure when at least about 50%, preferably 60 to 
75% of a sample exhibits a single polypeptide sequence. A substantially pure polypeptide typically 
comprises about 50%, preferably 60 to 90% weight/weight of a protein sample, more usually about 

1 0 95%, and preferably is over about 99% pure. Polypeptide purity or homogeneity is indicated by a 
number of means well known in the art, such as polyacrylamide gel electrophoresis of a sample, 
followed by visualizing a single polypeptide band upon staining the gel. For certain purposes higher 
resolution can be provided by using HPLC or other means well known in the art. 

As used herein, the term " non-human animal " refers to any non-human vertebrate, birds and 

1 5 more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, and 
horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is used to 
refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" expressly 
embrace human subjects unless preceded with the term "non-human". 

As used herein, the term " antibody " refers to a polypeptide or group of polypeptides which 

20 are comprised of at least one binding domain, where an antibody binding domain is formed from the 
folding of variable domains of an antibody molecule to form three-dimensional binding spaces with 
an internal surface shape and charge distribution complementary to the features of an antigenic 
determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies 
include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, 

25 Fab', F(ab) 2 , and F(ab') 2 fragments. 

As used herein, an " antigenic determinant " is the portion of an antigen molecule, in this case 
a hGGPPS polypeptide, that determines the specificity of the antigen-antibody reaction. An 
"epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 
amino acids in a spatial conformation which is unique to the epitope. Generally an epitope consists 

30 of at least 6 such amino acids, and more usually at least 8-1 0 such amino acids. Methods for 

determining the amino acids which make up an epitope include x-ray crystallography, 2-dimensional 
nuclear magnetic resonance, and epitope mapping e.g. the Pepscan method described by Geysen et 
al. 1984; PCT Publication No. WO 84/03564; and PCT Publication No. WO 84/03506. 

Throughout the present specification, the expression " nucleotide sequence " may be 

35 employed to designate indifferently a polynucleotide or an oligonucleotide or a nucleic acid. More 
precisely, the expression "nucleotide sequence" encompasses the nucleic material itself and is thus 
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not restricted to the sequence information (Le, the succession of letters chosen among the four base 
letters) that biochemically characterizes a specific DNA or RNA molecule. 

As used interchangeably herein, the term " oligonucleotides " , and " polynucleotides " include 
RNA, DNA. or RNA/DNA hybrid sequences of more than one nucleotide in either single chain or 
5 duplex form. The term "nucleotide" as used herein as an adjective to describe molecules comprising 
RNA, DNA, or RNA/DNA hybrid sequences of any length in single-stranded or duplex form. The 
term "nucleotide" is also used herein as a noun to refer to individual nucleotides or varieties of 
nucleotides, meaning a molecule, or individual unit in a larger nucleic acid molecule, comprising a 
purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a phosphate group, or 

10 phosphodiester linkage in the case of nucleotides within an oligonucleotide or polynucleotide. 
Although the term "nucleotide" is also used herein to encompass "modified nucleotides" which 
comprise at least one modifications (a) an alternative linking group, (b) an analogous form of purine, 
(c) an analogous form of pyrimidine, or (d) an analogous sugar, for examples of analogous linking 
groups, purine, pyrimidines, and sugars see for example PCT publication No WO 95/04064. 

1 5 However, the polynucleotides of the invention are preferably comprised of greater than 50% 
conventional deoxyribose nucleotides, and most preferably greater than 90% conventional 
deoxyribose nucleotides. The polynucleotide sequences of the invention may be prepared by any 
known method, including synthetic, recombinant, ex vivo generation, or a combination thereof, as 
well as utilizing any purification methods known in the art. 

20 The term " heterozygosity rate " is used herein to refer to the incidence of individuals in a 

population which are heterozygous at a particular allele. In a biallelic system, the heterozygosity 
rate is on average equal to 2P a (l-P a ), where P 3 is the frequency of the least common allele. In order 
to be useful in genetic studies, a genetic marker should have an adequate level of heterozygosity to 
allow a reasonable probability that a randomly selected person will be heterozygous. 

25 The term " genotype " as used herein refers the identity of the alleles present in an individual 

or a sample. In the context of the present invention a genotype preferably refers to the description of 
the biallelic marker alleles present in an individual or a sample. The term "genotyping" a sample or 
an individual for a biallelic marker consists of determining the specific allele or the specific 
nucleotide carried by an individual at a biallelic marker. 

30 The term " polymorphism " as used herein refers to the occurrence of two or more alternative 

genomic sequences or alleles between or among different genomes or individuals. "Polymorphic" 
refers to the condition in which two or more variants of a specific genomic sequence can be found in 
a population. A " polymorphic site " is the locus at which the variation occurs. A single nucleotide 
polymorphism is a single base pair change. Typically a single nucleotide polymorphism is the 

35 replacement of one nucleotide by another nucleotide at the polymorphic site. Deletion of a single 
nucleotide or insertion of a single nucleotide, also give rise to single nucleotide polymorphisms. In 
the context of the present invention "single nucleotide polymorphism" preferably refers to a single 
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nucleotide substitution. Typically, between different genomes or between different individuals, the 
polymorphic site may be occupied by two different nucleotides. 

The term " biallelic polymorphism " and " biallelic marker " are used interchangeably herein to 
refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the 
population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker 
site. Typically, the frequency of the less common allele of the biallelic markers of the present 
invention has been validated to be greater than 1%, preferably the frequency is greater than 10%, 
more preferably the frequency is at least 20% (i.e. heterozygosity rate of at least 0.32), even more 
preferably the frequency is at least 30% (i.e. heterozygosity rate of at least 0.42). A biallelic marker 
wherein the frequency of the less common allele is 30% or more is termed a "high quality biallelic 
marker". 

The location of nucleotides in a polynucleotide with respect to the center of the 
polynucleotide are described herein in the following manner. When a polynucleotide has an odd 
number of nucleotides, the nucleotide at an equal distance from the 3' and 5' ends of the 
polynucleotide is considered to be " at the center " of the polynucleotide, and any nucleotide 
immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is 
considered to be "within 1 nucleotide of the center." With an odd number of nucleotides in a 
polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be 
considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even 
number of nucleotides, there would be a bond and not a nucleotide at the center of the 
polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 
nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide would 
be considered to be "within 2 nucleotides of the center", and so on. 

As used herein the terminology " defining a biallelic marker " means that a sequence includes 
a polymorphic base from a biallelic marker. The sequences defining a biallelic marker may be of 
any length consistent with their intended use, provided that they contain a polymorphic base from a 
biallelic marker. The sequence has between 1 and 500 nucleotides in length, preferably between 5, 
10,15, 20, 25, or 40 and 200 nucleotides and more preferably between 30 and 50 nucleotides in 
length. Each biallelic marker therefore corresponds to two forms of a polynucleotide sequence 
included in a gene, which, when compared with one another, present a nucleotide modification at 
one position. Preferably, the sequences defining a biallelic marker include a polymorphic base of 
the biallelic marker 5-187-77. In some embodiments the sequences defining a biallelic marker 
comprise one of the sequences selected from the group consisting of SEQ ID Nos 5 and 6. 
Likewise, the term "marker" or "biallelic marker" requires that the sequence is of sufficient length to 
practically (although not necessarily unambiguously) identify the polymorphic allele, which usually 
implies a length of at least 4, 5, 6, 10, 15, 20, 25, or 40 nucleotides. 
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The terms ''base paired" and "Watson & Crick base paired** are used interchangeably herein 
to refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence 
identities in a manner like that found in double-helical DNA with thymine or uracil residues linked 
to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three 
hydrogen bonds (See Stryer, L., Biochemistry, 4 th edition, 1995). 

The terms ^ complementary ' or "complement thereof are used herein to refer to the 
sequences of polynucleotides which is capable of forming Watson & Crick base pairing with another 
specified polynucleotide throughout the entirety of the complementary region. For the purpose of the 
present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide 
when each base in the first polynucleotide is paired with its complementary base. Complementary 
bases are, generally, A and T (or A and U), or C and G. "Complement" is used herein as a synonym 
from "complementary polynucleotide", "complementary nucleic acid" and "complementary 
nucleotide sequence". These terms are applied to pairs of polynucleotides based solely upon their 
sequences and not any particular set of conditions under which the two polynucleotides would 
actually bind. 

Variants and fragments 

1. Polynucleotides 

The invention also relates to variants and fragments of the polynucleotides described herein, 
particularly of a hGGPPS gene containing one or more biallelic markers according to the invention. 

Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from a 
reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such as 
a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. Such 
non-naturally occurring variants of the polynucleotide may be made by mutagenesis techniques, 
including those applied to polynucleotides, cells or organisms. Generally, differences are limited so 
that the nucleotide sequences of the reference and the variant are closely similar overall and, in many 
regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to, 
nucleotide sequences that are at least 95% identical to any of SEQ ID Nos 1 -3 or the sequences 
complementary thereto or to any polynucleotide fragment of at least 8 consecutive nucleotides of 
any of SEQ ID Nos 1-3 or the sequences complementary thereto, and preferably at least 98% 
identical, more particularly at least 99.5% identical, and most preferably at least 99.9% identical to 
any of SEQ ID Nos 1 -3 or the sequences complementary thereto or to any polynucleotide fragment 
of at least 8 consecutive nucleotides of any of SEQ ID Nos 1-3 or the sequences complementary 
thereto. 

Changes in the nucleotide of a variant may be silent, which means that they do not alter the 
amino acids encoded by the polynucleotide. 
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However, nucleotide changes may also result in amino acid substitutions, additions, 
deletions, fusions and truncations m the polypeptide encoded by the reference sequence. The 
substitutions, deletions or additions may involve one or more nucleotides. The variants may be 
altered in coding or non-coding regions or both. Alterations in the coding regions may produce 
5 conservative or non-conservative amino acid substitutions, deletions or additions. 

In the context of the present invention, particularly preferred embodiments are those in 
which the polynucleotides encode polypeptides which retain substantially the same biological 
function or activity as the mature hGGPPS protein. 

A polynucleotide fragment is a polynucleotide having a sequence that entirely is the same as 
10 part but not all of a given nucleotide sequence, preferably the nucleotide sequence of a hGGPPS 
gene, and variants thereof. The fragment can be a portion of an exon or of an intron of a hGGPPS 
gene. It can also be a portion of the regulatory sequences of the hGGPPS gene. Preferably, such 
fragments comprise the polymorphic base of the biallelic marker 5-187-77 of SEQ ID Nos 5-6. 

Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or 
15 they may be comprised within a single larger polynucleotide of which they form a part or region. 
However, several fragments may be comprised within a single larger polynucleotide. 

As representative examples of polynucleotide fragments of the invention, there may be 
mentioned those which have from about 4, 6, 8, 15, 20, 25, 40, 10 to 20, 10 to 30, 30 to 55, 50 to 
100, 75 to 100 or 100 to 200 nucleotides in length. Preferred are those fragments having about 49 
20 nucleotides in length, such as those of SEQ ED Nos 5-6 or the sequences complementary thereto and 
containing at least one of the biallelic markers of a hGGPPS gene which are described herein. 

2. Polypeptides. 

The invention also relates to variants, fragments, analogs and derivatives of the polypeptides 
described herein, including mutated hGGPPS proteins. 

25 The variant may be 1) one in which one or more of the amino acid residues are substituted 

with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) 
and such substituted amino acid residue may or may not be one encoded by the genetic code, or 2) 
one in which one or more of the amino acid residues includes a substituent group, or 3) one in which 
the mutated hGGPPS is fused with another compound, such as a compound to increase the half-life 

30 of the polypeptide (for example, polyethylene glycol), or 4) one in which the additional amino acids 
are fused to the mutated hGGPPS, such as a leader or secretory sequence or a sequence which is 
employed for purification of the mutated hGGPPS or a preprotein sequence. Such variants are 
deemed to be within the scope of those skilled in the art. 

More particularly, a variant hGGPPS polypeptide comprises amino acid changes ranging 

35 from 1 , 2, 3, 4, 5, 1 0 to 20 substitutions, additions or deletions of one aminoacid, preferably from 1 
to 10, more preferably from 1 to 5 and most preferably from 1 to 3 substitutions, additions or 
deletions of one amino acid. The preferred amino acid changes are those which have little or no 
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influence on the biological activity or the capacity of the variant hGGPPS polypeptide to be 
recognized by antibodies raised against a native hGGPPS protein. 

By homologous peptide according to the present invention is meant a polypeptide containing 
one or several aminoacid additions, deletions and/or substitutions in the amino acid sequence of a 
5 hGGPPS polypeptide. In the case of an aminoacid substitution, one or several -consecutive or non- 
consecutive- aminoacids are replaced by « equivalent » aminoacids. 

The expression "equivalent" amino acid is used herein to designate any amino acid that may 
be substituted for one of the amino acids having similar properties, such that one skilled in the art of 
peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to 
1 0 be substantially unchanged. Generally, the following groups of amino acids represent equivalent 
changes: (1) Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) Val, He, Leu, 
Met, Ala, Phe; (4) Lys, Arg, His; (5) Phe, Tyr, Trp, His. 

By an equivalent aminoacid according to the present invention is also meant the replacement 
of a residue in the L-form by a residue in the D form or the replacement of a Glutamic acid (E) 
1 5 residue by a Pyro-glutamic acid compound. The synthesis of peptides containing at least one residue 
in the D-form is, for example, described by Koch (1977). 

A specific, but not restrictive, embodiment of a modified peptide molecule of interest 
according to the present invention, which consists in a peptide molecule which is resistant to 
proteolysis, is a peptide in which the -CONH- peptide bond is modified and replaced by a (CH 2 NH) 
20 reduced bond, a (NHCO) retro inverso bond, a (CH 2 -0) methylene-oxy bond, a (CH 2 -S) 

thiomethylene bond, a (CH 2 CH 2 ) carba bond, a (CO-CH 2 ) cetomethylene bond, a (CHOH-CH 2 ) 
hydroxyethylene bond), a (N-N) bound, a E-alcene bond or also a -CH-CH- bond. 

The polypeptide accoding to the invention could have post-translational modifications. For 
example, it can present the following modifications: acylation, disulfide bond formation, 
25 prenylation, carboxymethylation and phosphorylation. 

A polypeptide fragment is a polypeptide having a sequence that entirely is the same as part 
but not all of a given polypeptide sequence, preferably a polypeptide encoded by a hGGPPS gene 
and variants thereof. Preferred fragments include those regions possessing antigenic properties and 
which can be used to raise antibodies against the hGGPPS protein. 
30 Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or 

they may be comprised within a single larger polypeptide of which they form a part or region. 
However, several fragments may be comprised within a single larger polypeptide. 

As representative examples of polypeptide fragments of the invention, there may be 
mentioned those which compnse at least about 5, 6, 7, 8, 9 or 10 to 15, 10 to 20, 15 to 40, or 30 to 
35 55 amino acids of the hGGPPS, In some embodiments, the fragments contain at least one amino 
acid mutation in the hGGPPS protein. 
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Identity Between Nucleic Acids Or Polypeptides 

The terms "percentage of sequence identity*" and "percentage homology" are used 
interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are 
determined by comparing two optimally aligned sequences over a comparison window, wherein the 
5 portion of the polynucleotide or polypeptide sequence in the comparison window may comprise 
additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise 
additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by 
determining the number of positions at which the identical nucleic acid base or amino acid residue 
occurs in both sequences to yield the number of matched positions, dividing the number of matched 

1 0 positions by the total number of positions in the window of comparison and multiplying the result by 
100 to yield the percentage of sequence identity. Homology is evaluated using any of the variety of 
sequence comparison algorithms and programs known in the art. Such algorithms and programs 
include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW 
(Pearson and Lipman, 1988; Altschul et aL, 1990; Thompson et al., 1994; Higgins et al., 1996; 

1 5 Altschul et aL, 1993). In a particularly preferred embodiment, protein and nucleic acid sequence 
homologies are evaluated using the Basic Local Alignment Search Tool ("BLAST") which is well 
known in the art (see, e.g., Karlin and Altschul, 1990; Altschul et aL, 1990, 1993, 1997). In 
particular, five specific BLAST programs are used to perform the following task: 

(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein 
20 sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; 

(3) BLASTX compares the six-frame conceptual translation products of a query nucleotide 
sequence (both strands) against a protein sequence database; 

25 (4) TBLASTN compares a query protein sequence against a nucleotide sequence database 

translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence against 
the six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, which are 
30 referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid sequence 
and a test sequence which is preferably obtained from a protein or nucleic acid sequence database. 
High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring matrix, 
many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 matrix 
(GonnetetaL, 1992; Henikoff and Henikoff, 1993). Less preferably, the P AM or PAM250 
35 matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978). The BLAST programs 
evaluate the statistical significance of all high-scoring segment pairs identified, and preferably 
selects those segments which satisfy a user-specified threshold of significance, such as a user- 
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specified percent homology. Preferably, the statistical significance of a high-scoring segment pair is 
evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschui, 1990). 
Stringent Hybridization Conditions 

By way of example and not limitation, procedures using conditions of high stringency are as 

5 follows: Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65°C in 
buffer composed of 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 
0.02% BSA, and 500 ug/ml denatured salmon sperm DNA. Filters are hybridized for 48 h at 65°C, 
the preferred hybridization temperature, in prehybridization mixture containing 100 ug/ml denatured 
salmon sperm DNA and 5-20 X 10 b cpm of 32 P-labeled probe. Alternatively, the hybridization step 

0 can be performed at 65°C in the presence of SSC buffer, 1 x SSC corresponding to 0. 1 5M NaCl and 
0.05 M Na citrate. Subsequently, filter washes can be done at 37°C for I h in a solution containing 2 
x SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA, followed by a wash in 0.1 X SSC at 50°C for 45 
mm. Alternatively, filter washes can be performed in a solution containing 2 x SSC and 0.1% SDS, 
or 0.5 x SSC and 0.1% SDS, or 0.1 x SSC and 0.1% SDS at 68°C for 15 minute intervals. 

5 Following the wash steps, the hybridized probes are detectable by autoradiography. Other 

conditions of high stringency which may be used are well known in the art and as cited in Sambrook 
et al., 1989; and Ausubel et al., 1989, are incorporated herein in their entirety. These hybridization 
conditions are suitable for a nucleic acid molecule of about 20 nucleotides in length. There is no 
need to say that the hybridization conditions described above are to be adapted according to the 

0 length of the desired nucleic acid, following techniques well known to the one skilled in the art. The 
suitable hybridization conditions may for example be adapted according to the teachings disclosed in 
the book of Hames and Higgins (1985) or in Sambrook et al.(1989). 

hGGPS gene polynucleotide, cDNAs and associated regulatory regions. 
Genomic sequences 

The invention concerns a purified or isolated nucleic acid encoding the hGGPS polypeptide, 
wherein said nucleic acid comprises the nucleotide sequence of SEQ ID No 1. 

The present invention concerns a purified or isolated nucleic acid comprising a nucleotide 
sequence of SEQ ID No 1, or a nucleotide sequence complementary thereto or a fragment or a 
variant thereof. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-485, 547-632, 827-7291, 7385-13759, 13831-14062, 14671-15054, and 
15252-17131. 
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The invention also encompasses a purified or isolated nucleic acid having at least 95% 
nucleotide identity with the nucleotide sequence of SEQ ID No 1 or a complementary sequence 
thereto. 

A further object of the invention consists in a purified or isolated nucleic acid of at least 12 
nucleotides in length, wherein said nucleic acid hybridizes under stringent hybridization conditions 
with a polynucleotide sequence of SEQ ID No 1 or a complementary sequence thereto. 

The hGGPS genomic nucleic acid sequence comprises five exons. These five exons are 
described in Table A. 


Table A 


Exon 

Beginning position 
in SEQ ID No 1 

End position 
In SEQ ID No 1 

Intron 

Beginning position 
in SEQ ED No 1 

End position 
In SEQ ID No 1 

1 

486 

546 

1 

547 

7291 

Ibis 

633 

826 

Ibis 

827 

7291 

2 

7292 

7384 | 2 

7385 

13759 

3 

13760 

13830 

3 

13831 

14062 

4 

14063 

15251 



The hGGPS introns defined hereinafter for the purpose of the present invention are not 
exactly what is generally understood as "introns" by the one skilled in the art and will consequently 
be defined below. 

Generally, an intron is defined as a nucleotide sequence that is present both in the genomic 
DNA and in the unspliced mRNA molecule, and which is absent from the mRNA molecule which 
has undergone the splicing events. In the case of the hGGPS gene, the inventors have found that at 
least two different spliced mRNA molecules are produced when this gene is transcribed, as it will be 
described in detail in a further section of the specification. The first spliced mRNA molecule 
comprises Exons 1, 2, 3 and 4, as shown in Figure 1. Thus, the genomic nucleotide sequence 
comprised between Exon 1 and Exon 2 is an intronic sequence as regards to this first mRNA 
molecule, despite the fact that this intronic sequence contains Exon Ibis. In contrast, Exon Ibis is of 
course an exonic nucleotide sequence as regards to the second hGGPS mRNA molecule shown in 
Figure 2. 

For the purpose of the present invention and in order to make a clear and unique designation 
of the different nucleic acids of the invention, it has been postulated that the polynucleotides 
contained both in the nucleotide sequence of SEQ ID No 1 and in any of the nucleotide sequences of 
SEQ ID Nos 2 or 3 are considered as exonic sequences. Conversely, the polynucleotides contained 
in the nucleotide sequence of SEQ ID No 1 and located between Exon 1 and Exon 4, but which are 
absent both from the nucleotide sequence of SEQ ID No 2 and from the nucleotide sequence of SEQ 
ID No 3 are considered as intronic sequences. 

Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising 
a nucleotide sequence selected from the group consisting of the exons of the hGGPPS gene, or a 
sequence complementary thereto. The invention also deals with purified, isolated, or recombinant 
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nucleic acids comprising a combination of at least two exons of the hGGPPS gene, wherein the 
polynucleotides are arranged within the nucleic acid, from the 5*~end to the 3 '-end of said nucleic 
acid, in the same order as in SEQ ID No 1 . 

The nucleic acids defining the hGGPS introns described above, as well as their fragments 
5 and variants, may be used as oligonucleotide primers or probes in order to detect the presence of a 
copy of the hGGPS m a test sample, or alternatively in order to amplify a target nucleotide sequence 
within the hGGPS intronic sequences. 

hGGPS cDN As 

The inventors have discovered that the expression of the hGGPS gene leads to the 
1 0 production of at least two mRNA molecules, respectively a first and a second hGGPS transcription 
product. 

The first transcription product comprises Exons 1, 2, 3 and 4. This cDNA of SEQ ID No 2 
includes a 5 , -UTR region, spanning the whole Exon 1 and part of Exon 2. This 5'-UTR region starts 
from the nucleotide at position 1 and ends at the nucleotide in position 84 of SEQ ID No 2. The 

15 cDNA of SEQ ID No 2 includes a 3'-UTR region starting from the nucleotide at position 988 and 
ending at the nucleotide at position 1414 of SEQ ID No 2. The 3'UTR carries a potential 
polyadenylation signal located between the nucleotide in position 1289 and the nucleotide in 
position 1294 of the nucleic acid of SEQ 3D No 2. The ORF encoding hGGPS is comprised between 
the nucleotide in position 85 and the nucleotide in position 987 of SEQ ID No 2. 

20 The second transcription product comprises Exons 1 bis, 2, 3 and 4. This cDNA of SEQ ID 

No 3 includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the 
nucleotide in position 217 of SEQ ID No 3. The cDNA of SEQ ID No 3 includes a 3>-UTR region 
starting from the nucleotide at position 1121 and ending at the nucleotide at position 1547 of SEQ 
ID No 3. The 3 ? UTR carries a potential polyadenylation signal located between the nucleotide in 

25 position 1422 and the nucleotide in position 1427 of the nucleic acid of SEQ ID No 3. The ORF 
encoding hGGPS is comprised between the nucleotide in position 218 and the nucleotide in position 
1 120 of the nucleotide sequence of SEQ ID No 3. 

Another object of the invention consists of a purified or isolated nucleic acid selected from 
the group consisting of the nucleotide sequences of SEQ ID Nos 2 and 3 or a complementary 

30 sequence thereto or a fragment thereof. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the nucleotide positions 

35 834-1217 of SEQ ID No 2. Additional preferred nucleic acids of the invention include isolated, 
purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 
30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the 
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complements thereof, wherein said contiguous span comprises at least 1, 2, 3. 5, or 10 of the 
nucleotide positions 967-1351 of SEQ ID No 3. 

The invention also pertains to a purified or isolated nucleic acid having at least 95% of 
nucleotide identity with any of the nucleotide sequences of SEQ ID Nos 2 and 3 or a complementary 
5 sequence thereto. 

A further object of the invention consists in a purified or isolated nucleic acid of at least 12 
nucleotides in length, wherein said nucleic acid hybridizes under stringent hybridization conditions 
with a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ED Nos 
2 and 3, or a sequence complementary thereto. 
1 0 Another object of the invention consists in a purified or isolated nucleic acid comprising a 

nucleic acid fragment of a nucleotide sequence selected from the group consisting of SEQ ID Nos 2 
and 3, wherein this nucleic acid fragment encodes a polypeptide having an amino acid sequence 
beginning at the amino acid in position 200 and ending at the amino acid in position 300 of the 
hGGPS polypeptide of SEQ ID No 4, or a nucleic acid encoding a peptide fragment thereof 

1 5 Regulatory sequences 

As already mentioned hereinbefore, the polynucleotide of SEQ ID No 1 contains regulatory 
sequences both in the non-coding 5 '-flanking region and in the non-coding 3 '-flanking region that 
border the hGGPS coding region. 

The longest 5 '-regulatory sequence of the hGGPS gene is localized between the nucleotide 
20 in position 1 and the nucleotide in position 632 of SEQ ID No 1 . However, a shorter 5'-reguIatory 
sequence of the hGGPS gene is localized between the nucleotide in position 1 and the nucleotide in 
position 485 of SEQ ID Nol . 

The hGGPS 3 '-regulatory region, as shown in Figure 1, comprises a nucleotide sequence 
starting from the nucleotide m position 15252 of SEQ ID No 1 and ending at the nucleotide in 
25 position 17131 of SEQ ED Nol. 

Polynucleotides derived from the hGGPS regulatory regions described above are useful in 
order to detect the presence of at least a copy of the nucleotide sequence of SEQ ID No 1 in a test 
sample. 

The promoter activity of the regulatory regions contained in the hGGPS nucleotide sequence 
30 of SEQ ID No 1 can be assessed as described below. 

Genomic sequences located upstream of the hGGPS gene are cloned into a suitable promoter 
reporter vector, such as the pSEAP-Basic, pSEAP-Enhancer, ppgal-Basic, p|3gal-Enhancer, or 
pEGFP-1 Promoter Reporter vectors available from Clontech. Briefly, each of these promoter 
reporter vectors include multiple cloning sites positioned upstream of a reporter gene encoding a 
35 readily assayable protein such as secreted alkaline phosphatase, beta galactosidase, or green 

fluorescent protein. The sequences upstream the hGGPS coding region are inserted into the cloning 
sites upstream of the reporter gene m both orientations and introduced into an appropriate host cell. 
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The level of reporter protein is assayed and compared to the level obtained from a vector which 
lacks an insert in the cloning site. The presence of an elevated expression level in the vector 
containing the insert with respect to the control vector indicates the presence of a promoter in the 
insert. If necessary, the upstream sequences can be cloned into vectors which contain an enhancer 
for increasing transcription levels from weak promoter sequences. A significant level of expression 
above that observed with the vector lacking an insert indicates that a promoter sequence is present in 
the inserted upstream sequence. 

Promoter sequences within the upstream genomic DNA may be further defined by 
constructing nested deletions in the upstream DNA using conventional techniques such as 
Exonuclease III digestion. The resulting deletion fragments can be inserted into the promoter 
reporter vector to determine whether the deletion has reduced or obliterated promoter activity. In this 
way, the boundaries of the promoters may be defined. If desired, potential individual regulatory sites 
within the promoter may be identified using site directed mutagenesis or linker scanning to obliterate 
potential transcription factor binding sites within the promoter individually or in combination. The 
effects of these mutations on transcription levels may be determined by inserting the mutations into 
cloning sites in promoter reporter vectors. 

Polynucleotides carrying the regulatory elements located both at the 5> end and at the V end 
of the hGGPS coding region may be advantageously used to control the transcriptional and 
translational activity of an heterologous polynucleotide of interest. 

Thus, the present invention also concerns a purified or isolated nucleic acid comprising a 
polynucleotide which is selected from the group consisting of the 5' and 3* regulatory regions, or a 
sequence complementary thereto or a biologically active fragment or variant thereof. "5' regulatory 
region" refers to the nucleotide sequence located between positions 1 and 632 of SEQ ID No 1. "3' 
regulatory region" refers to the nucleotide sequence located between positions 15252 and 17131 of 
SEQ ID No 1. 

The present invention is also directed to a polynucleotide comprising a functional portion of 
a regulatory region contained in the contemplated hGGPS gene and to its use in a recombinant 
expression vector carrying a polynucleotide encoding a polypeptide or a nucleic acid of interest. 

Preferred fragments of the 5' regulatory region have a length of about 400 nucleotides, more 
particularly about 300 nucleotides, more preferably 200 nucleotides and most preferably about 100 
nucleotides. 

Preferred fragments of the 3 5 regulatory region have a length of about 600 nucleotides, more 
particularly about 300 nucleotides, more preferably 200 nucleotides and most preferably about 100 
nucleotides. 

In order, to identify the relevant biologically active polynucleotide derivatives of the 5' and 
3' regulatory regions, the one skill in the art will refer to the book of Sambrook et al. (1 989) which 
describes the use of a recombinant vector carrying a marker gene (i.e. beta galactosidase. 
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chloramphenicol acetyl transferase, etc.) the expression of which will be detected when placed under 
the control of a biologically active derivative polynucleotide of the 5* and 3* regulatory regions. 

The regulatory polynucleotides of the invention may be prepared from a polynucleotide of 
the nucleotide sequence SEQ ID No 1 by cleavage using suitable restriction enzymes, as described 
5 for example in the book of Sambrook et ah (1 989). The regulatory polynucleotides may also be 
prepared by digestion of a polynucleotide of the nucleotide sequence SEQ ID No 1 by an 
exonuclease enzyme, such as for example Bal31 (Wabiko et al., 1986). These regulatory 
polynucleotides can also be prepared by nucleic acid chemical synthesis, as described elsewhere in 
the specification, 

1 0 The regulatory polynucleotides according to the invention may be advantageously part of a 

recombinant expression vector that may be used to express a coding sequence m a desired host cell 
or host organism. The recombinant expression vectors according to the invention are described 
elsewhere in the specification. 

A preferred S'-reguIatory polynucleotide of the invention includes the 5 '-untranslated region 
15 (5'-UTR) located between the nucleotide at position 1 and the nucleotide at position 84 of SEQ ID 
No 2. or a biologically active fragment or variant thereof. 

Another preferred 5'-regulatory polynucleotide of the invention includes the 5 '-untranslated 
region (5'-UTR) located between the nucleotide at position 1 and the nucleotide at position 2 1 7 of 
SEQ ED No 3, or a biologically active fragment or variant thereof. 
20 A preferred 3 '-regulatory polynucleotide of the invention includes the 3 '-untranslated region 

(3'-UTR) consisting in the nucleotide sequence starting from the nucleotide in position 988 and 
ending a the nucleotide in position 1414 of the nucleic acid of SEQ ID No 2. 

A further object of the invention consists of a purified or isolated nucleic acid comprising : 

a) a nucleic acid comprising the 5' regulatory region or a biologically active fragment or 
25 variant thereof; 

b) a polynucleotide encoding a desired polypeptide or nucleic acid operably linked to the 5' 
regulatory region or its biologically active fragment or variant thereof; 

c) optionally, a nucleic acid comprising the 3' regulatory region or a biologically active 
fragment or variant thereof. 

30 The desired polypeptide encoded by the above described nucleic acid may be of various 

nature or origin, encompassing proteins of prokaryotic or eukaryotic origin. Among the polypeptides 
expressed under the control of a hGGPS regulatory region, there may be cited bacterial, fungal or 
viral antigens. Also encompassed are eukaryotic proteins such as intracellular proteins, like "house 
keeping" proteins, membrane-bound proteins, like receptors, and secreted proteins like the numerous 

35 endogenous mediators such as cytokines. 
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The desired nucleic acids encoded by the above described polynucleotide, usually a RNA 
molecule, may be complementary to a desired coding polynucleotide, for example to the hGGPS 
coding sequence, and thus useful as an antisense polynucleotide. 

Such a polynucleotide may be included in a recombinant expression vector in order to 
5 express the desired polypeptide or the desired nucleic acid in host cell or in a host organism. Suitable 
recombinant vectors that contain a polynucleotide such as described hereinbefore are disclosed 
elsewhere in the specification. 

Coding regions 

The hGGPS open reading frame is contained in the corresponding mRNAs of SEQ ID Nos 2 

10 and 3. 

More precisely, the effective hGGPS coding sequence (CDS) is comprised between the 
nucleotide at position 85 (first nucleotide of the ATG codon) and the nucleotide at position 987 (end 
nucleotide of the TAA codon) of SEQ ID No 2. A purified or isolated polynucleotide comprising the 
hGGPS coding region defined above is another object of the invention. 

15 The above disclosed polynucleotide that contains the coding sequence of the hGGPS gene of 

the invention may be expressed in a desired host cell or a desired host organism, when this 
polynucleotide is placed under the control of suitable expression signals. The expression signals may 
be either the expression signals contained in the regulatory regions in the hGGPS gene of the 
invention or in contrast be exogenous regulatory nucleic sequences. Such a polynucleotide, when 

20 placed under the suitable expression signals, may also be inserted in a vector for its expression. 

Bialleiic Markers 

The inventors have discovered nucleotide polymorphisms located within the genomic DNA 
containing the hGGPS gene, and among them SNP that are also termed bialleiic markers. The 
bialleiic markers of the invention can be used for example for the generation of genetic map, the 
25 linkage analysis, the association studies. 

A) Identification Of Bialleiic Markers 

There are two preferred methods through which the bialleiic markers of the present 
invention can be generated. In a first method, DNA samples from unrelated individuals are pooled 
together, following which the genomic DNA of interest is amplified and sequenced. The nucleotide 
30 sequences thus obtained are then analyzed to identify significant polymorphisms. 

One of the major advantages of this method resides in the fact that the pooling of the DNA 
samples substantially reduces the number of DNA amplification reactions and sequencing reactions 
which must be carried out. Moreover, this method is sufficiently sensitive so that a bialleiic marker 
obtained therewith usually shows a sufficient degree of informativeness for conducting association 
35 studies. 
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In a second method for generating biallelic markers, the DNA samples are not pooled and 
are therefore amplified and sequenced individually. The resulting nucleotide sequences obtained are 
then also analyzed to identify significant polymorphisms. 

The following is a description of the various parameters of a preferred method used by the 
inventors to generate the markers of the present invention. 

L DNA extraction 

The genomic DNA samples from which the biallelic markers of the present invention are 
generated are preferably obtained from unrelated individuals corresponding to a heterogeneous 
population of known ethnic background. 

The number of individuals from whom DNA samples are obtained can vary substantially, 
preferably from about 10 to about 1000, preferably from about 50 to about 200 individuals. It is 
usually preferred to collect DNA samples from at least about 100 individuals in order to have 
sufficient polymorphic diversity in a given population to identify as many markers as possible and to 
generate statistically significant results. 

As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples which can 
be tested by the methods of the present invention described herein and include human and animal 
body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, and 
various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, milk, 
white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; fixed 
tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 
aspirates and fixed cell specimens. The preferred source of genomic DNA used in the context of the 
present invention is from peripheral venous blood of each donor. 

The techniques of DNA extraction are well-known to the skilled technician. Such techniques 
are described notably by Lin et al. (1998) and by Mackey et al. (1998). Details of a preferred 
embodiment are provided in Example 2. 

2. DNA amplification 

The identification of biallelic markers in a sample of genomic DNA may be facilitated 
through the use of DNA amplification methods. DNA samples can be pooled or unpooled for the 
amplification step. DNA amplification techniques are well known to those skilled m the art. 

Amplification techniques that can be used in the context of the present invention include, but 
are not limited to, the ligase chain reaction (LCR) described in EP-A- 320 308, WO 9320227 and 
EP-A-439 182, the polymerase chain reaction (PCR, RT-PCR) and techniques such as the nucleic 
acid sequence based amplification (NASBA) described in Guatelli J.C., et aL(1990) and in Compton 
1.(1991), Q-beta amplification as described in European Patent Application No 4544610, strand 
displacement amplification as described in Walker et al.(1996) and EP A 684 315 and, target 
mediated amplification as described in PCT Publication WO 9322461 . 
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LCR and Gap LCR are exponential amplification techniques, both depend on DNA ligase to 
join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), probe pairs 
are used which include two primary (first and second) and two secondary (third and fourth) probes, 
all of which are employed m molar excess to target. The first probe hybridizes to a first segment of 
5 the target strand and the second probe hybridizes to a second segment of the target strand, the first 
and second segments being contiguous so that the primary probes abut one another in 5' phosphate- 
3 'hydroxy! relationship, and so that a ligase can covalently fuse or ligate the two probes into a fused 
product. In addition, a third (secondary) probe can hybridize to a portion of the first probe and a 
fourth (secondary) probe can hybridize to a portion of the second probe in a similar abutting fashion. 

10 Of course, if the target is initially double stranded, the secondary probes also will hybridize to the 
target complement in the first instance. Once the ligated strand of primary probes is separated from 
the target strand, it will hybridize with the third and fourth probes, which can be ligated to form a 
complementary, secondary ligated product. It is important to realize that the ligated products are 
functionally equivalent to either the target or its complement. By repeated cycles of hybridization 

1 5 and ligation, amplification of the target sequence is achieved. A method for multiplex LCR has also 
been described (WO 9320227). Gap LCR (GLCR) is a version of LCR where the probes are not 
adjacent but are separated by 2 to 3 bases. 

For amplification of mRNAs, it is within the scope of the present invention to reverse 
transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single 

20 enzyme for both steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric Gap LCR 
(RT-AGLCR) as described by Marshall et al.(1994). AGLCR is a modification of GLCR that 
allows the amplification of RNA. 

The PCR technology is the preferred amplification technique used in the present invention. 
A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR 

25 technology, see White ( 1 997) and the publication entitled "PCR Methods and Applications" ( 1 99 1 , 
Cold Spring Harbor Laboratory Press). In each of these PCR procedures, PCR primers on either 
side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid 
sample along with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, 
or Vent polymerase. The nucleic acid in the sample is denatured and the PCR primers are 

30 specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized 
primers are extended. Thereafter, another cycle of denaturation, hybridization, and extension is 
initiated. The cycles are repeated multiple times to produce an amplified fragment containing the 
nucleic acid sequence between the primer sites. PCR has further been described in several patents 
including US Patents 4,683,195; 4,683,202; and 4,965,188. 

35 The PCR technology is the preferred amplification technique used to identify new biallelic 

markers. A typical example of a PCR reaction suitable for the purposes of the present invention is 
provided in Example 3. 
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One of the aspects of the present invention is a method for the amplification of the human 
hGGPPS gene, particularly of a fragment of the genomic sequence of SEQ ID No 1 or of the cDNA 
sequence of SEQ ID No 2 or 3, or a fragment or a variant thereof in a test sample, preferably using 
the PCR technology. This method comprises the steps of: 

a) contacting a test sample with amplification reaction reagents comprising a pair of 
amplification primers as described above and located on either side of the polynucleotide 
region to be amplified, and 

b) optionally, detecting the amplification products. 

The invention also concerns a kit for the amplification of a hGGPPS gene sequence, 
particularly of a portion of the genomic sequence of SEQ ID No 1 or of the cDNA sequence of SEQ 
ID No 2 or 3, or a variant thereof in a test sample, wherein said kit comprises: 

a) a pair of oligonucleotide primers located on either side of the hGGPPS region to be 
amplified; 

b) optionally, the reagents necessary for performing the amplification reaction. 

In one embodiment of the above amplification method and kit, the amplification product is 
detected by hybridization with a labeled probe having a sequence which is complementary to the 
amplified region. In another embodiment of the above amplification method and kit, primers 
comprise a sequence which is selected from the group consisting of SEQ ID Nos 7-9. 

In a first embodiment of the present invention, biallelic markers are identified using genomic 
sequence information generated by the inventors. Sequenced genomic DNA fragments are used to 
design primers for the amplification of 500 bp fragments. These 500 bp fragments are amplified 
from genomic DNA and are scanned for biallelic markers. Primers may be designed using the OSP 
software (Hillier L. and Green P., 1991). All primers may contain, upstream of the specific target 
bases, a common oligonucleotide tail that serves as a sequencing primer. Those skilled in the art are 
familiar with primer extensions, which can be used for these purposes. 

Preferred primers, useful for the amplification of genomic sequences encoding the candidate 
genes, focus on promoters, exons and splice sites of the genes. A biallelic marker presents a higher 
probability to be an eventual causal mutation if it is located in these functional regions of the gene. 
Preferred amplification primers of the invention include the nucleotide sequences of SEQ ID Nos 8 
and 9. 

Other preferred primers according to the invention allow the amplification of various 
fragments of the purified or isolated nucleic acid of SEQ ID No 1 . These primers are presented 
below as couples of forward and reverse primers that may be used together to amplify a desired 
nucleotide sequence. 


Position range of forward 
primers in SEQ ID No 1 

Complementary position range of 
reverse primer in SEQ ID No 1 

7233-7251 

7565-7582 

13582-13600 

13982-14001 

14222-14240 

14626-14645 
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14606-14623 

15007-15026 

14845-14864 

15246-15265 


The primers described above are individually useful as oligonucleotide probes in order to 
detect the corresponding hGGPS nucleotide sequence in a sample, and more preferably to detect the 
presence of a hGGPS UNA molecule in a sample suspected to contain it. 

3. Sequencing of am plified genomic DNA and identification of polymorphisms 

The amplification products generated as described above, are then sequenced using any 
method known and available to the skilled technician. Methods for sequencing DNA using either 
the dideoxy-mediated method (Sanger method) or the Maxam-Gilbert method are widely known to 
those of ordinary skill in the art. Such methods are for example disclosed in Sambrook et al.(1989). 
Alternative approaches include hybridization to high-density DNA probe arrays as described in Chee 
et al.(1996). 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. The products of the sequencing reactions 
are run on sequencing gels and the sequences are determined using gel image analysis. The 
polymorphism search is based on the presence of superimposed peaks in the electrophoresis pattern 
resulting from different bases occurring at the same position. Because each dideoxy terminator is 
labeled with a different fluorescent molecule, the two peaks corresponding to a biallelic site present 
distinct colors corresponding to two different nucleotides at the same position on the sequence. 
However, the presence of two peaks can be an artifact due to background noise. To exclude such an 
artifact, the two DNA strands are sequenced and a comparison between the peaks is carried out. In 
order to be registered as a polymorphic sequence, the polymorphism has to be detected on both 
strands. 

The above procedure permits those amplification products, which contain biallelic markers 
to be identified. The detection limit for the frequency of biallelic polymorphisms detected by 
sequencing pools of 100 individuals is approximately 0.1 for the minor allele, as verified by 
sequencing pools of known allelic frequencies. However, more than 90% of the biallelic 
polymorphisms detected by the pooling method have a frequency for the minor allele higher than 
0.25. Therefore, the biallelic markers selected by this method have a frequency of at least 0.1 for the 
minor allele and less than 0.9 for the major allele. Preferably at least 0.2 for the minor allele and less 
than 0.8 for the major allele, more preferably at least 0.3 for the minor allele and less than 0.7 for the 
major allele, thus a heterozygosity rate higher than 0.18, preferably higher than 0.32, more 
preferably higher than 0.42. 

In another embodiment, biallelic markers are detected by sequencing individual DNA 
samples, the frequency of the minor allele of such a biallelic marker may be less than 0.1. 

In a particular embodiment of the invention, the test samples are a pool of 100 individuals 
and 50 individual samples. This is the methodology used in the preferred embodiment of the present 
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invention, in which 1 biallelic marker has been identified in a genomic region containing the hGGPS 
gene. This biallelic marker is called 5-187-77 and is located in mtron 3 oihGGPPS gene. The 
biallelic marker consists in an insertion of a nucleotide T. 

The polymorphisms identified above can be further confirmed and their respective 
5 frequencies can be determined through various methods using the previously described primers and 
probes as described herein. These methods can also be useful for genotyping either new populations 
in association studies or linkage analysis or individuals in the context of detection of alleles of 
biallelic markers which are known to be associated with a given trait. The genotyping of the biallelic 
markers is also important for the mapping. It will be appreciated that the methods described below 
10 can be equally performed on individual or pooled DNA samples. 

b) Genotyping Of Biallelic Markers 

Once a given polymorphic site has been found and characterized as a biallelic marker as 
described above, several methods can be used in order to determine the specific allele carried by an 
individual at the given polymorphic base. 
1 5 The identification of biallelic markers described previously allows the design of appropriate 

oligonucleotides, which can be used as probes and primers, to amplify a hGGPS gene containing the 
polymorphic site of interest and for the detection of such polymorphisms. 

In one embodiment the invention encompasses methods of genotyping comprising 
determining the identity of a nucleotide at a hGGPPS-veMcd biallelic marker or the complement 
20 thereof in a biological sample; optionally, wherein said hGGPPS -related biallelic marker is the 
biallelic marker 5-187-77, and the complement thereof; optionally, wherein said biological sample is 
derived from a single subject; optionally, wherein the identity of the nucleotides at said biallelic 
marker is determined for both copies of said biallelic marker present in said individual's genome; 
optionally, wherein said biological sample is derived from multiple subjects; Optionally, the 
genotyping methods of the invention encompass methods with any further limitation described in 
this disclosure, or those following, specified alone or in any combination; Optionally, said method 
is performed in vitro', optionally, further comprising amplifying a portion of said sequence 
comprising the biallelic marker prior to said determining step; Optionally, wherein said amplifying 
is performed by PCR, LCR, or replication of a recombinant vector comprising an origin of 
replication and said fragment in a host cell; optionally, wherein said determining is performed by a 
hybridization assay, a sequencing assay, a microsequencing assay, or an enzyme-based mismatch 
detection assay. 

1) Amplification 

Methods and polynucleotides are provided to amplify a segment of nucleotides comprising 
one or more biallelic marker of the present invention. It will be appreciated that amplification of 
DNA fragments comprising biallelic markers may be used in various methods and for various 
purposes and is not restricted to genotyping. Nevertheless, many genotyping methods, although not 
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all require the previous amplification of the DNA region carrying the biallehc marker of interest. 
Such methods specifically increase the concentration or total number of sequences that span the 
biallelic marker or include that site and sequences located either distal or proximal to it. Diagnostic 
assays may also rely on amplification of DNA segments carrying a biallelic marker of the present 
5 invention. Amplification of DNA may be achieved by any method known in the art. Amplification 
techniques are described above in the section entitled, "DNA amplification." 

Some of these amplification methods are particularly suited for the detection of single 
nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and the 
identification of the polymorphic nucleotide as it is further described below. 
1 0 The identification of biallelic markers as described above allows the design of appropriate 

oligonucleotides, which can be used as primers to amplify DNA fragments comprising the biallelic 
markers of the present invention. Amplification can be performed using the primers initially used to 
discover new biallelic markers which are described herein or any set of primers allowing the 
amplification of a DNA fragment comprising a biallelic marker of the present invention. 
1 5 In some embodiments the present invention provides primers for amplifying a DNA 

fragment containing one or more biallelic markers of the present invention. Preferred amplification 
primers are listed in Example 3. It will be appreciated that the primers listed are merely exemplary 
and that any other set of primers which produce amplification products containing one or more 
biallelic markers of the present invention are also of use. 
20 The spacing of the primers determines the length of the segment to be amplified. In the 

context of the present invention, amplified segments carrying biallelic markers can range in size 
from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical, 
fragments from 50-1000 bp are preferred and fragments from 1 00-600 bp are highly preferred. It 
will be appreciated that amplification primers for the biallelic markers may be any sequence which 
25 allow the specific amplification of any DNA fragment carrying the markers. Amplification primers 
may be labeled or immobilized on a solid support as described in "Oligonucleotide probes and 
primers". 

2) Sequencing 

The nucleotide present at a polymorphic site can be determined by sequencing methods. In 
30 a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as 
described above. DNA sequencing methods are described in "Sequencing Of Amplified Genomic 
DNA And Identification Of Polymorphisms". 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. Sequence analysis allows the identification 
35 of the base present at the biallelic marker site. 
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3) Microsequencing 

In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is 
detected by a single nucleotide primer extension reaction. This method involves appropriate 
microsequencing primers which, hybridize just upstream of the polymorphic base of interest in the 
5 target nucleic acid. A polymerase is used to specifically extend the 3* end of the primer with one 
single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the 
identity of the incorporated nucleotide is determined in any suitable way. 

Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the 
extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing 
10 machines to determine the identity of the incorporated nucleotide as described in EP 412 883, the 
disclosure of which is incorporated herein by reference in its entirety. Alternatively capillary 
electrophoresis can be used in order to process a higher number of assays simultaneously. An 
example of a typical microsequencing procedure that can be used in the context of the present 
invention is provided in Example 5. 

15 Different approaches can be used for the labeling and detection of ddNTPs. A homogeneous 

phase detection method based on fluorescence resonance energy transfer has been described by Chen 
and Kwok (1997) and Chen et al.(1997). Alternatively, the extended primer may be analyzed by 
MALDI-TOF Mass Spectrometry. The base at the polymorphic site is identified by the mass added 
onto the microsequencing primer (see Haff and Smirnov, 1 997). 

20 Microsequencing may be achieved by the established microsequencing method or by 

developments or derivatives thereof. Alternative methods include several solid-phase 
microsequencing techniques. The basic microsequencing protocol is the same as described 
previously, except that the method is conducted as a heterogeneous phase assay, in which the primer 
or the target molecule is immobilized or captured onto a solid support. For example, immobilization 

25 can be carried out via an interaction between biotinylated DNA and streptavidin-coated 

microtitration wells or avidin-coated polystyrene particles. In the same manner, oligonucleotides or 
templates may be attached to a solid support in a high-density format. In such solid phase 
microsequencing reactions, incorporated ddNTPs can be radiolabeled (Syvanen, 1994) or linked to 
fluorescein (Livak and Hamer, 1994). The detection of radiolabeled ddNTPs can be achieved 

30 through scintillation-based techniques. The detection of fluorescein-linked ddNTPs can be based on 
the binding of antifluorescein antibody conjugated with alkaline phosphatase, followed by 
incubation with a chromogenic substrate (such as p-nitrophenyl phosphate). Other possible reporter- 
detection pairs include: ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase 
conjugate (Harju et al., 1993) or biotinylated ddNTP and horseradish peroxidase-conjugated 

35 streptavidin with o-phenyienediamme as a substrate (WO 92/15712). As yet another alternative 
solid-phase microsequencing procedure, Nyren et al.(1993) described a method relying on the 
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detection of DNA polymerase activity by an enzymatic luminometric inorganic pyrophosphate 
detection assay (ELIDA). 

Pastinen et al.(1997) describe a method for multiplex detection of single nucleotide 
polymorphism in which the solid phase minisequencing principle is applied to an oligonucleotide 
5 array format. High-density arrays of DNA probes attached to a solid support (DNA chips) are 
further described below* 

In one aspect the present invention provides polynucleotides and methods to genotype one 
or more biallelic markers of the present invention by performing a microsequencing assay. Preferred 
microsequencing primers include the nucleotide sequence of SEQ ID No 7. It will be appreciated 
1 0 that the microsequencing primer of SEQ ID No 7 is merely exemplary and that, any primer having a 
3 ? end immediately adjacent to the polymorphic nucleotide may be used. Similarly, it will be 
appreciated that microsequencing analysis may be performed for any biallelic marker or any 
combination of biallelic markers of the present invention. One aspect of the present invention is a 
solid support which includes one or more microsequencing primers for determining the identity of a 
1 5 nucleotide at a biallelic marker site. 

4. Mismatch detection assays based on polymerases and ligases 

In one aspect the present invention provides polynucleotides and methods to determine the 
allele of one or more biallelic markers of the present invention in a biological sample, by mismatch 
detection assays based on polymerases and/or ligases. These assays are based on the specificity of 
20 polymerases and ligases. Polymerization reactions places particularly stringent requirements on 
correct base pairing of the V end of the amplification primer and the joining of two oligonucleotides 
hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, 
especially at the V end. Methods, primers and various parameters to amplify DNA fragments 
comprising biallelic markers of the present invention are further described above in "DNA 
25 amplification". 

Allele Specific Amplification Primers 

Discrimination between the two alleles of a biallelic marker can also be achieved by allele 
specific amplification, a selective strategy, whereby one of the alleles is amplified without 
amplification of the other allele. For allele specific amplification, at least one member of the pair of 
30 primers is sufficiently complementary with a region of a hGGPPS gene comprising the polymorphic 
base of a biallelic marker of the present invention to hybridize therewith and to initiate the 
amplification. Such primers are able to discriminate between the two alleles of a biallelic marker. 

This is accomplished by placing the polymorphic base at the 3' end of one of the 
amplification primers. Because the extension forms from the 3 ? end of the primer, a mismatch at or 
near this position has an inhibitory effect on amplification. Therefore, under appropriate 
amplification conditions, these primers only direct amplification on their complementary allele. 
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Detennining the precise location of the mismatch and the corresponding assay conditions are well 
within the ordinary skill in the art. 
Ligation/ Amplification Based Methods 

The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are designed 
5 to be capable of hybridizing to abutting sequences of a single strand of a target molecules. One of 
the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise 
complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that 
their termini abut, and create a ligation substrate that can be captured and detected. OLA is capable 
of detecting single nucleotide polymorphisms and may be advantageously combined with PCR as 

1 0 described by Nickerson et al.( 1 990). In this method, PCR is used to achieve the exponential 
amplification of target DNA, which is then detected using OLA. 

Other amplification methods which are particularly suited for the detection of single 
nucleotide polymorphism include LCR (ligase chain reaction), Gap LCR (GLCR) which are 
described above in "DNA Amplification". LCR uses two pairs of probes to exponentially amplify a 

15 specific target. The sequences of each pair of oligonucleotides, is selected to permit the pair to 
hybridize to abutting sequences of the same strand of the target. Such hybridization forms a 
substrate for a template-dependant ligase. In accordance with the present invention, LCR can be 
performed with oligonucleotides having the proximal and distal sequences of the same strand of a 
biallelic marker site. In one embodiment, either oligonucleotide will be designed to include the 

20 biallelic marker site. In such an embodiment, the reaction conditions are selected such that the 
oligonucleotides can be ligated together only if the target molecule either contains or lacks the 
specific nucleotide that is complementary to the biallelic marker on the oligonucleotide. In an 
alternative embodiment, the oligonucleotides will not include the biallelic marker, such that when 
they hybridize to the target molecule, a "gap" is created as described in WO 90/01069. This gap is 

25 then "filled" with complementary dNTPs (as mediated by DNA polymerase), or by an additional 
pair of oligonucleotides. Thus at the end of each cycle, each single strand has a complement capable 
of serving as a target during the next cycle and exponential allele-specific amplification of the 
desired sequence is obtained. 

Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method for determining the 

30 identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This method 
involves the incorporation of a nucleoside triphosphate that is complementary to the nucleotide 
present at the preselected site onto the terminus of a primer molecule, and their subsequent ligation 
to a second oligonucleotide. The reaction is monitored by detecting a specific label attached to the 
reaction's solid phase or by detection in solution. 

35 5. Hybridization Assay Methods 

A preferred method of determining the identity of the nucleotide present at a biallelic marker 
site involves nucleic acid hybridization. The hybridization probes, which can be conveniently used 
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in such reactions, preferably include the probes defined herein. Any hybridization assay may be 
used including Southern hybridization, Northern hybridization, dot blot hybridization and solid- 
phase hybridization (see Sambrook et al., 1989). 

Specific probes can be designed that hybridize to one form of a biallelic marker and not to 
the other and therefore are able to discriminate between different allelic forms. Allele-specific 
probes are often used in pairs, one member of a pair showing perfect match to a target sequence 
containing the original allele and the other showing a perfect match to the target sequence containing 
the alternative allele. Hybridization conditions should be sufficiently stringent that there is a 
significant difference in hybridization intensity between alleles, and preferably an essentially binary 
response, whereby a probe hybridizes to only one of the alleles. Stringent, sequence specific 
hybridization conditions, under which a probe will hybridize only to the exactly complementary 
target sequence are well known in the art (Sambrook et al, 1989). Although such hybridization can 
be performed in solution, it is preferred to employ a solid-phase hybridization assay. The target 
DNA comprising a biallelic marker of the present invention may be amplified prior to the 
hybridization reaction. The presence of a specific allele in the sample is determined by detecting the 
presence or the absence of stable hybrid duplexes formed between the probe and the target DNA. 
The detection of hybrid duplexes can be carried out by a number of methods. Various detection 
assay formats are well known which utilize detectable labels bound to either the target or the probe 
to enable detection of the hybrid duplexes. Typically, hybridization duplexes are separated from 
unhybridized nucleic acids and the labels bound to the duplexes are then detected. Those skilled in 
the art will recognize that wash steps may be employed to wash away excess target DNA or probe as 
well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting 
the hybrids using the labels present on the primers and probes. 

Two recently developed assays allow hybridization-based allele discrimination with no need 
for separations or washes (see Landegren U. et al., 1998). The TaqMan assay takes advantage of 
the 5* nuclease activity of Taq DNA polymerase to digest a DNA probe annealed specifically to the 
accumulating amplification product. TaqMan probes are labeled with a donor-acceptor dye pair that 
interacts via fluorescence energy transfer. Cleavage of the TaqMan probe by the advancing 
polymerase during amplification dissociates the donor dye from the quenching acceptor dye, greatly 
increasing the donor fluorescence. All reagents necessary to detect two allelic variants can be 
assembled at the beginning of the reaction and the results are monitored in real time (see Livak et al., 
1995), In an alternative homogeneous hybridization based procedure, molecular beacons are used 
for allele discriminations. Molecular beacons are hairpin-shaped oligonucleotide probes that report 
the presence of specific nucleic acids in homogeneous solutions. When they bind to their targets 
they undergo a conformational reorganization that restores the fluorescence of an internally 
quenched fluorophore (Tyagi et al., 1998). 
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The polynucleotides provided herein can be used to produce probes which can be used in 
hybridization assays for the detection of biallelic marker alleles in biological samples. These probes 
are characterized in that they preferably comprise between 8 and 50 nucleotides, and in that they are 
sufficiently complementary to a sequence comprising a biallelic marker of the present invention to 
5 hybridize thereto and preferably sufficiently specific to be able to discriminate the targeted sequence 
for only one nucleotide variation. A particularly preferred probe is 25 nucleotides in length. 
Preferably the biallelic marker is within 4 nucleotides of the center of the polynucleotide probe. In 
particularly preferred probes, the biallelic marker is at the center of said polynucleotide. Preferred 
probes comprise a nucleotide sequence selected from the group consisting of amplicons listed in 

10 Table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. Preferred probes comprise a nucleotide 
sequence selected from the group consisting of SEQ ID Nos 5 and 6 and the sequences 
complementary thereto. In preferred embodiments the polymorphic base(s) are within 5, 4, 3, 2, 1, 

1 5 nucleotides of the center of the said polynucleotide, more preferably at the center of said 
polynucleotide. 

Preferably the probes of the present invention are labeled or immobilized on a solid support. 
Labels and solid supports are further described in "Oligonucleotide Probes and Primers". The 
probes can be non-extendable as described in "Oligonucleotide Probes and Primers". 
20 By assaying the hybridization to an allele specific probe, one can detect the presence or 

absence of a biallelic marker allele in a given sample. High-Throughput parallel hybridization in 
array format is specifically encompassed within "hybridization assays" and are described below. 

6- Hybridization To Addressable Arrays Of Oligonucleotides 

Hybridization assays based on oligonucleotide arrays rely on the differences in hybridization 

25 stability of short oligonucleotides to perfectly matched and mismatched target sequence variants. 
Efficient access to polymorphism information is obtained through a basic structure comprising high- 
density arrays of oligonucleotide probes attached to a solid support (e.g., the chip) at selected 
positions. Each DNA chip can contain thousands to millions of individual synthetic DNA probes 
arranged in a grid-like pattern and miniaturized to the size of a dime. 

30 The chip technology has already been applied with success in numerous cases. For example, 

the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae mutant strains, 
and in the protease gene of HIV-1 virus (Hacia et al., 1 996; Shoemaker et al. f 1996; Kozal et al., 
1996). Chips of various formats for use in detecting biallelic polymorphisms can be produced on a 
customized basis by Affymetrix (GeneChip™), Hyseq (HyChip and HyGnostics), and Protogene 

35 Laboratories. 

In general, these methods employ arrays of oligonucleotide probes that are complementary 
to target nucleic acid sequence segments from an individual which, target sequences include a 
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polymorphic marker. EP 785280 describes a tiling strategy for the detection of single nucleotide 
polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific 
polymorphisms. By "tiling * is generally meant the synthesis of a defined set of oligonucleotide 
probes which is made up of a sequence complementary to the target sequence of interest, as well as 
5 preselected variations of that sequence, e.g., substitution of one or more given positions with one or 
more members of the basis set of nucleotides. Tiling strategies are further described in PCT 
application No. WO 95/1 1995. Hybridization and scanning may be carried out as described in PCT 
application No. WO 92/10092 and WO 95/1 1995 and US patent No. 5,424,186. 

Thus, in some embodiments, the chips may comprise an array of nucleic acid sequences of 
10 fragments of about 1 5 nucleotides in length. In further embodiments, the chip may comprise an 
array including at least one of the sequences selected from the group consisting of amplicons listed 
in table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. In preferred embodiments the 
15 polymorphic base is within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, more 
preferably at the center of said polynucleotide. In some embodiments, the chip may comprise an 
array of at least 2, 3, 4, 5, 6, 7, 8 or more of these polynucleotides of the invention. Solid supports 
and polynucleotides of the present invention attached to solid supports are further described in 
"Oligonucleotide Probes And Primers". 

7- Integrated Systems 

Another technique, which may be used to analyze polymorphisms, includes multicomponent 
integrated systems, which miniaturize and compartmentalize processes such as PCR and capillary 
electrophoresis reactions in a single functional device. An example of such technique is disclosed in 
US patent 5,589,136, which describes the integration of PCR amplification and capillary 
electrophoresis in chips. 

Integrated systems can be envisaged mainly when microfluidic systems are used. These 
systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer 
included on a microchip. The movements of the samples are controlled by electric, electroosmotic 
or hydrostatic forces applied across different areas of the microchip to create functional microscopic 
valves and pumps with no moving parts. 

For genotyping biallelic markers, the microfluidic system may integrate nucleic acid 
amplification, microsequencing, capillary electrophoresis and a detection method such as laser- 
induced fluorescence detection. 

Oligonucleotide Probes and primers 

Polynucleotides derived from the hGGPPS gene are useful in order to detect the presence of 
at least a copy of a nucleotide sequence of SEQ ID No 1, or a fragment, complement, or variant 
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thereof in a test sample. Furthermore polynucleotides derived from the hGGPPS gene can be used 
to generate antisense polynucleotide or polynucleotide for the triple helix strategy. 

Particularly preferred probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
5 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-485, 547-632, 827-7291,7385-13759, 13831-14062, 14671-15054, and 
15252-17131. 

The invention also relates to nucleic acid probes characterized in that they hybridize 

10 specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected 
from the group consisting of the nucleotide sequences 1-485, 547-632, 827-7291, 7385-13759, 
13831-14062, 14671-15054, and 15252-17131 of SEQ ID No 1 or a variant thereof or a sequence 
complementary thereto. 

Particularly preferred probes and primers of the invention include isolated, purified, or 

15 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the nucleotide positions 
834-1217 of SEQ ID No 2. Additional preferred probes and primers of the invention include 
isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 

20 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ED No 2 
or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 
nucleotide positions 967-1351 of SEQ ID No 3. 

The invention also relates to nucleic acid probes characterized in that they hybridize 
specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected 

25 from the group consisting of the nucleotide sequences 834-1217 of SEQ ID No 2 and 967-1351 of 
SEQ ID No 3, or a variant thereof or a sequence complementary thereto. 

In one embodiment the invention encompasses isolated, purified, and recombinant 
polynucleotides consisting of, or consisting essentially of a contiguous span of 8 to 50 nucleotides of 
any one of SEQ ID Nos 1 -3 and the complement thereof, wherein said span includes a kGGPPS- 

30 related biallelic marker in said sequence; optionally, wherein said hGGPPS-related biallehc marker 
is the biallelic marker 5-1 87-77, and the complement thereof; optionally, wherein said contiguous 
span is 18 to 50 nucleotides in length and said biallelic marker is within 4 nucleotides of the center 
of said polynucleotide; optionally, wherein said polynucleotide consists of said contiguous span and 
said contiguous span is 25 nucleotides in length and said biallelic marker is at the center of said 

35 polynucleotide; optionally, wherein the 3' end of said contiguous span is present at the 3' end of said 
polynucleotide; and optionally, wherein the 3' end of said contiguous span is located at the 3' end of 
said polynucleotide and said biallelic marker is present at the 3' end of said polynucleotide. In a 
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preferred embodiment, said probes comprises, consists of, or consists essentially of a sequence 
selected from SEQ ID Nos 5 and 6 and the complementary sequences thereto. 

In another embodiment the invention encompasses isolated, purified and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 50 
5 nucleotides of SEQ ID Nos I -3, or the complements thereof, wherein the 3' end of said contiguous 
span is located at the 3' end of said polynucleotide, and wherein the 3* end of said polynucleotide is 
located within 20 nucleotides upstream of a hGGPPS -related biallelic marker in said sequence; 
optionally, wherein said /zGGPPS-related biallelic marker is the biallelic marker 5-187-77, and the 
complement thereof; optionally, wherein the 3' end of said polynucleotide is located 1 nucleotide 
10 upstream of said hGGPPS -related biallelic marker in said sequence; and optionally, wherein said 
polynucleotide consists essentially of a sequence of SEQ ID No 7. 

In a further embodiment, the invention encompasses isolated, purified, or recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from the 
sequences of SEQ ID Nos 8and 9. 
15 In an additional embodiment, the invention encompasses polynucleotides for use in 

hybridization assays, sequencing assays, and enzyme-based mismatch detection assays for 
determining the identity of the nucleotide at a hGGPPS -related biallelic marker, as well as 
polynucleotides for use in amplifying segments of nucleotides comprising a hGGPPS -related 
biallelic marker; optionally, wherein said hGGPPS-related biallelic marker is the biallelic marker 5- 
20 1 87-77, and the complements thereof. 

A probe or a primer according to the invention has between 8 and 1000 nucleotides in 
length, or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 
nucleotides in length. More particularly, the length of these probes and primers can range from 8, 
10, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 
25 nucleotides. The appropriate length for primers and probes under a particular set of assay conditions 
may be empirically determined by one of skill in the art. A preferred probe or primer consists of a 
nucleic acid comprising a polynucleotide selected from the group of the nucleotide sequences of 
SEQ ED Nos 5-9 or a fragment thereof or a complementary sequence thereto. 

The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. The 
30 Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C 
content. The higher the G+C content of the primer or probe, the higher is the melting temperature 
because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in 
the probes of the invention usually ranges between 10 and 75 %, preferably between 35 and 60 %, 
and more preferably between 40 and 55 %. 
35 The primers and probes can be prepared by any suitable method, including, for example, 

cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as 
the phosphodiester method of Narang et al.(1979), the phosphodiester method of Brown et al.(1979), 
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the diethylphosphoramidite method of Beaucage et al.(1981) and the solid support method described 
in EP 0 707 592. 

Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs 
such as, for example peptide nucleic acids which are disclosed in International Patent Application 
WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444: 
5,034,506 and 5,142,047. The probe may have to be rendered *'non-extendable" in that additional 
dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and 
nucleic acid probes can be rendered non-extendable by modifying the 3* end of the probe such that 
the hydroxyl group is no longer capable of participating in elongation. For example, the 3 f end of 
the probe can be functionalized with the capture or detection label to thereby consume or otherwise 
block the hydroxyl group. Alternatively, the 3* hydroxyl group simply can be cleaved, replaced or 
modified, U.S. Patent Application Serial No. 07/049,061 filed April 19, 1993 describes 
modifications, which can be used to render a probe non-extendable. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
incorporating any label known in the art to be detectable by spectroscopic, photochemical, 
biochemical, immunochemical, or chemical means. For example, useful labels include radioactive 
substances (including, 32 P, 35 S, 3 H, 125 I), fluorescent dyes (including, 5-bromodesoxyuridin, 
fluorescein, acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at 
their 3' and 5' ends. Examples of non-radioactive labeling of nucleic acid fragments are described 
in the French patent No. FR-78 10975 or by Urdea et al (1988) or Sanchez-Pescador et al (1988). In 
addition, the probes according to the present invention may have structural characteristics such that 
they allow the signal amplification, such structural characteristics being, for example, branched 
DNA probes as those described by Urdea et al. in 1991 or in the European patent No. EP 0 225 807 
(Chiron). 

A label can also be used to capture the primer, so as to facilitate the immobilization of either 
the primer or a primer extension product, such as amplified DNA, on a solid support. A capture 
label is attached to the primers or probes and can be a specific binding member which forms a 
binding pair with the solid's phase reagent's specific binding member (e.g. biotin and streptavidin). 
Therefore depending upon the type of label carried by a polynucleotide or a probe, it may be 
employed to capture or to detect the target DNA. Further, it will be understood that the 
polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. For 
example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, it 
may be selected such that it binds a complementary portion of a primer or probe to thereby 
immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself 
serves as the binding member, those skilled in the art will recognize that the probe will contain a 
sequence or "tail" that is not complementary to the target. In the case where a polynucleotide primer 
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itself serves as the capture label, at least a portion of the primer will be free to hybridize with a 
nucleic acid on a solid phase. DNA Labeling techniques are well known to the skilled technician. 

The probes of the present invention are useful for a number of purposes. They can be 
notably used in Southern hybridization to genomic DNA. The probes can also be used to detect 
5 PCR amplification products. They may also be used to detect mismatches in the hGGPPS gene or 
mRNA using other techniques. 

Any of the polynucleotides, primers and probes of the present invention can be conveniently 
immobilized on a solid support. Solid supports are known to those skilled in the art and include the 
walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, nitrocellulose strips, 

10 membranes, microparticles such as latex particles, sheep (or other animal) red blood cells, duracytes 
and others. The solid support is not critical and can be selected by one skilled in the art. Thus, latex 
particles, microparticles, magnetic or non-magnetic beads, membranes, plastic tubes, walls of 
microtiter wells, glass or silicon chips, sheep (or other suitable animal's) red blood cells and 
duracytes are all suitable examples. Suitable methods for immobilizing nucleic acids on solid 

15 phases include ionic, hydrophobic, covalent interactions and the like. A solid support, as used 

herein, refers to any material which is insoluble, or can be made insoluble by a subsequent reaction. 
The solid support can be chosen for its intrinsic ability to attract and immobilize the capture reagent. 
Alternatively, the solid phase can retain an additional receptor which has the ability to attract and 
immobilize the capture reagent. The additional receptor can include a charged substance that is 

20 oppositely charged with respect to the capture reagent itself or to a charged substance conjugated to 
the capture reagent. As yet another alternative, the receptor molecule can be any specific binding 
member which is immobilized upon (attached to) the solid support and which has the ability to 
immobilize the capture reagent through a specific binding reaction. The receptor molecule enables 
the indirect binding of the capture reagent to a solid support material before the performance of the 

25 assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized 
plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, 
bead, microparticle, chip, sheep (or other suitable animal's) red blood cells, duracytes® and other 
configurations known to those of ordinary skill in the art. The polynucleotides of the invention can 
be attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 

30 15, 20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, 

polynucleotides other than those of the invention may be attached to the same solid support as one or 
more polynucleotides of the invention. 

Consequently, the invention also deals with a method for detecting the presence of a nucleic 
acid comprising at least a part of a nucleotide sequence selected from the group consisting of SEQ 

35 ID Nos 1-3 in a sample, said method comprising the following steps of: 
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a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes, which can 
hybridize to a nucleotide sequence included in one of the nucleic acids of SEQ ID Nos 1-3, and the 
sample to be assayed. 

b) detecting the hybrid complex formed between the probe and a nucleic acid in the sample. 
5 Preferably, the nucleic acid probe is selected from the group of polynucleotides consisting of 

the nucleotide sequences SEQ ID Nos 5-9. In a first preferred embodiment of this detection method, 

said nucleic acid probe or the plurality of nucleic acid probes are labeled with a detectable molecule. 

In a second preferred embodiment of said method, said nucleic acid probe or the plurality of nucleic 

acid probes has been immobilized on a substrate. 
10 The invention further concerns a kit for detecting the presence of a nucleic acid comprising 

at least a part of a nucleotide sequence selected from the group consisting of SEQ ID Nos 1-3 in a 

sample, said kit comprising : 

a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize to a 

nucleotide sequence included in one of the nucleic acids of SEQ ID Nos 1-3; 
15 b) optionally, the reagents necessary for performing the hybridization reaction. 

The nucleic acid probe or the plurality of nucleic acid probes that are included in the 

detection kit described above may be selected from the group consisting of SEQ ID Nos 5-9. In a 

first preferred embodiment of the detection kit, the nucleic acid probe or the plurality of nucleic acid 

probes are labeled with a detectable molecule. In a second preferred embodiment of the detection kit, 
20 the nucleic acid probe or the plurality of nucleic acid probes has been immobilized on a substrate. 

Oligonucleotide arrays 

A substrate comprising a plurality of oligonucleotide primers or probes of the invention may 
be used either for detecting or amplifying targeted sequences in the hGGPPS gene and may also be 
used for detecting mutations in the coding or in the non-coding sequences of the hGGPPS gene. 

25 Any polynucleotide provided herein may be attached in overlapping areas or at random 

locations on the solid support. Alternatively the polynucleotides of the invention may be attached in 
an ordered array wherein each polynucleotide is attached to a distinct region of the solid support 
which does not overlap with the attachment site of any other polynucleotide. Preferably, such an 
ordered array of polynucleotides is designed to be "addressable" where the distinct locations are 

30 recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays 
typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a 
substrate in different known locations. The knowledge of the precise location of each 
polynucleotides location makes these "addressable" arrays particularly useful in hybridization 
assays. Any addressable array technology known in the art can be employed with the 

35 polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is 
known as the Genechips™, and has been generally described in US Patent 5,143,854; PCT 
publications WO 90/15070 and 92/10092. These arrays may generally be produced using 
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mechanical synthesis methods or light directed synthesis methods which incorporate a combination 
of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991). The 
immobilization of arrays of oligonucleotides on solid supports has been rendered possible by the 
development of a technology generally identified as "Very Large Scale Immobilized Polymer 
5 Synthesis" (VLSIPS™) in which, typically, probes are immobilized in a high density array on a 
solid surface of a chip. Examples of VLSIPS™ technologies are provided in US Patents 5 ,143,854; 
and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95/1 1995, which 
describe methods for forming oligonucleotide arrays through techniques such as light-directed 
synthesis techniques. In designing strategies aimed at providing arrays of nucleotides immobilized 

10 on solid supports, further presentation strategies were developed to order and display the 

oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence 
information. Examples of such presentation strategies are disclosed in PCT Publications WO 
94/12305, WO 94/11530, WO 97/29212 and WO 97/31256. 

In another embodiment of the oligonucleotide arrays of the invention, an oligonucleotide 

1 5 probe matrix may advantageously be used to detect mutations occurring in the hGGPPS gene and 
preferably in its regulatory region. For this particular purpose, probes are specifically designed to 
have a nucleotide sequence allowing their hybridization to the genes that carry known mutations 
(either by deletion, insertion or substitution of one or several nucleotides). By known mutations, it 
is meant, mutations on the hGGPPS gene that have been identified according, for example to the 

20 technique used by Huang et al.(1996) or Samson et al.(I996). 

Another technique that is used to detect mutations in the hGGPPS gene is the use of a high- 
density DNA array. Each oligonucleotide probe constituting a unit element of the high density DNA 
array is designed to match a specific subsequence of the hGGPPS genomic DNA or cDNA. Thus, 
an array consisting of oligonucleotides complementary to subsequences of the target gene sequence 

25 is used to determine the identity of the target sequence with the wild gene sequence, measure its 
amount, and detect differences between the target sequence and the reference wild gene sequence of 
the hGGPPS gene. In one such design, termed 4L tiled array, is implemented a set of four probes 
(A, C, G, T), preferably 15-nucleotide oligomers. In each set of four probes, the perfect complement 
will hybridize more strongly than mismatched probes. Consequently, a nucleic acid target of length 

30 L is scanned for mutations with a tiled array containing 4L probes, the whole probe set containing all 
the possible mutations in the known wild reference sequence. The hybridization signals of the 15- 
mer probe set tiled array are perturbed by a single base change in the target sequence. As a 
consequence, there is a characteristic loss of signal or a "footprint" for the probes flanking a 
mutation position. This technique was described by Chee et al. in 1996. 

35 Consequently, the invention concerns an array of nucleic acid molecules comprising at least 

one polynucleotide described above as probes and primers. Preferably, the invention concerns an 
array of nucleic acid comprising at least two polynucleotides described above as probes and primers. 
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A further object of the invention consists of an array of nucleic acid sequences comprising 
either at least one of the sequences selected from the group consisting of SEQ ID Nos 5-9, the 
sequences complementary thereto, a fragment thereof of at least 8, 1 0, 1 2, 1 5, 1 8, 20, 25, 30, or 40 
consecutive nucleotides thereof, and at least one sequence comprising the biallelic marker 5-187-77 
5 and the complements thereto. 

The invention also pertains to an array of nucleic acid sequences comprising either at least 
two of the sequences selected from the group consisting of SEQ ID Nos 5-9, the sequences 
complementary thereto, a fragment thereof of at least 8, 10, 12, 15, 18, 20, 25, 30, or 40 consecutive 
nucleotides thereof, and at least one sequence comprising the biallelic marker 5-187-77 and the 
1 0 complements thereto . 

Vectors for the expression of a regulatory or a coding polynucleotide according to the 

invention* 

Any of the regulatory polynucleotides or the coding polynucleotides of the invention may be 
inserted into recombinant vectors for expression in a recombinant host cell or a recombinant host 
1 5 organism. 

Thus, the present invention also encompasses a family of recombinant vectors that contains 
either a regulatory polynucleotide selected from the group consisting of the regulatory 
polynucleotides derived from the hGGPS gene, or a polynucleotide comprising the hGGPS coding 
sequence, or both. 

20 More particularly, the present invention relates to expression vectors which include nucleic 

acids encoding the hGGPS protein of the amino acid sequence of SEQ ID No 4 described therein 
under the control of either one regulatory sequence selected among the hGGPS regulatory 
polynucleotides, or alternatively under the control of an exogenous regulatory sequence. 

A recombinant expression vector comprising a nucleic acid selected from the group 

25 consisting of the 5' or 3' regulatory regions of hGGPPS* or biologically active fragments or variants 
thereof, is also part of the present invention. 

Generally, a recombinant vector of the invention may comprise any of the polynucleotides 
described herein, including regulatory sequences, and coding sequences, as well as any hGGPPS 
primer or probe as defined above. More particularly, the recombinant vectors of the present 

30 invention can comprise any of the polynucleotides described in the "hGGPPS cDNA Sequences" 
section, the "Coding Regions" section, "Genomic sequences" section and the "Oligonucleotide 
Probes And Primers" section. 

Some of the elements which can be found in the vectors of the present invention are 
described in further detail in the following sections. 
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a) Vectors 

A recombinant vector according to the invention comprises, but is not limited to, a YAC 
(Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid. a 
cosmid, a plasmid or even a linear DNA molecule which may consist of a chromosomal, non- 
5 chromosomal and synthetic DNA. Such a recombinant vector can comprise a transcriptional unit 
comprising an assembly of : 

(1) a genetic element or elements having a regulatory role in gene expression, for example 
promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 10 to 300 bp 
in length that act on the promoter to increase the transcription. 
10 (2) a structural or coding sequence which is transcribed into mRNA and eventually 

translated into a polypeptide, and 

(3) appropriate transcription initiation and termination sequences. Structural units intended 
for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling 
extracellular secretion of translated protein by a host cell. Alternatively, where recombinant protein 
15 is expressed without a leader or transport sequence, it may include an N-terminal residue. This 

residue may or may not be subsequently cleaved from the expressed recombinant protein to provide 
a final product. 

Generally, recombinant expression vectors will include origins of replication, selectable 
markers permitting transformation of the host cell, and a promoter derived from a highly expressed 
20 gene to direct transcription of a downstream structural sequence. The heterologous structural 
sequence is assembled in appropriate phase with translation initiation and termination sequences, 
and preferably a leader sequence capable of directing secretion of translated protein into the 
periplasmic space or extracellular medium. 

The selectable marker genes for selection of transformed host cells are preferably 
25 dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, TRP1 for 5. cerevisiae or 
tetracycline, rifampicin or ampicillin resistance in E. coli, or levan saccharase for mycobacteria. 

As a representative but non-limiting example, useful expression vectors for bacterial use can 
comprise a selectable marker and bacterial origin of replication derived from commercially available 
plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial vectors include, 
30 for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEM1 (Promega Biotec, Madison, WI, 
USA). 

Large numbers of suitable vectors and promoters are known to those of skill in the art, and 
commercially available, such as bacterial vectors : pQE70, pQE60, pQE-9 (Qiagen), pbs, pDIO, 
phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A (Stratagene); 
35 ptrc99a, pKK223-3, pKK233-3, pDR540, pRTTS (Pharmacia); or eukaryotic vectors ; pWLNEO, 
pSV2CAT s pOG44, pXTI, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); 
baculovirus transfer vector pVL1392/1393 (Pharmingen); pQE-30 (QIAexpress). 
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A suitable vector for the expression of the hGGPS polypeptide of SEQ ID No 4 is a 
baculovirus vector that can be propagated m insect cells and in insect cell lines. A specific suitable 
host vector system is the pVL1392/1393 baculovirus transfer vector (Pharmingen) that is used to 
transfect the SF9 cell line (ATCC N°CRL 171 1) which is derived from Spodoptera frugiperda. 

Other suitable vectors for the expression of the hGGPS polypeptide of SEQ ID No 4 in a 
baculovirus expression system include those described by Chai et al. (1993), Vlasak et al. (1983) and 
Lenhardetal. (1996). 

Mammalian expression vectors will comprise an origin of replication, a suitable promoter 
and enhancer, and also any necessary nbosome binding sites, polyadenylation signal, splice donor 
and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences. 
DNA sequences derived from the SV40 viral genome, for example SV40 origin, early promoter, 
enhancer, splice and polyadenylation signals may be used to provide the required nontranscribed 
genetic elements. 

b) Promoters 

The suitable promoter regions used in the expression vectors according to the present 
invention are chosen taking into account the cell host in which the heterologous gene has to be 
expressed. 

A suitable promoter may be heterologous with respect to the nucleic acid for which it 
controls the expression or alternatively can be endogenous to the native polynucleotide containing 
the coding sequence to be expressed. Additionally, the promoter is generally heterologous with 
respect to the recombinant vector sequences within which the construct promoter/coding sequence 
has been inserted. 

Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA 
polymerase promoters, the polyhedrin promoter, or the pi 0 protein promoter from baculovirus (Kit 
Novagen) (Smith et al., 1983; O'Reilly et al., 1992), the lambda P R promoter or also the trc 
promoter. 

Promoter regions can be selected from any desired gene using, for example, CAT 
(chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 
Particularly preferred bacterial promoters include lad, lacZ, T3, T7, gpt, lambda PR, PL and trp. 
Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, 
LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and promoter 
is well within the level of ordinary skill in the art. 

The choice of a promoter is well within the ability of a person skilled in the field of genetic 
egineering. For example, one may refer to the book of Sambrook et al. (1989) or also to the 
procedures described by Fuller et al. (1996). 

The vector containing the appropriate DNA sequence as described above, more preferably a 
hGGPS gene regulatory polynucleotide, a polynucleotide encoding the hGGPS polypeptide of SEQ 
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ID No 4 or both of them, can be utilized to transform an appropriate host to allow the expression of 
the desired polypeptide or polynucleotide. 

c) Other types of vectors 

The in vivo expression of a hGGPS polypeptide of SEQ ID No 4 may be useful in order to 
5 correct a genetic defect related to the expression of the native gene in a host organism or to the 
production of a biologically inactive hGGPS protein. 

Consequently, the present invention also deals with recombinant expression vectors mainly 
designed for the in vivo production of the hGGPS polypeptide of SEQ ID No 4 by the introduction 
of the appropriate genetic material in the organism of the patient to be treated. This genetic material 
10 may be introduced in vitro in a cell that has been previously extracted from the organism, the 
modified cell being subsequently reintroduced in the said organism, directly in vivo into the 
appropriate tissue, and preferably in the olfactory epithelium. 

By « vector » according to this specific embodiment of the invention is intended either a 
circular or a linear DNA molecule. 
15 One specific embodiment for a method for delivering a protein or peptide to the interior of a 

cell of a vertebrate in vivo comprises the step of introducing a preparation comprising a 
physiologically acceptable carrier and a naked polynucleotide operatively coding for the polypeptide 
of interest into the interstitial space of a tissue comprising the cell, whereby the naked 
polynucleotide is taken up into the interior of the cell and has a physiological effect. 
20 In a specific embodiment, the invention provides a composition for the in vivo production of 

the hGGPS protein or polypeptide described herein. It comprises a naked polynucleotide operatively 
coding for this polypeptide, in solution in a physiologically acceptable carrier, and suitable for 
introduction into a tissue to cause cells of the tissue to express the said protein or polypeptide. 

Compositions comprising a polynucleotide are described in the PCT application N° WO 
25 90/1 1092 (Vical Inc.) and also in the PCT application N° WO 95/1 1307 (Institut Pasteur, INSERM, 
Universite d'Ottawa) as well as in the articles of Tacson et al. (1996) and of Huygen et al. (1996). 

The amount of the vector to be injected to the desired host organism vary according to the 
site of injection. As an indicative dose, it will be injected between 0,1 and 100 ^ig of the vector in an 
animal body, preferably a mammal body, for example a mouse body. 
30 In another embodiment of the vector according to the invention, it may be introduced in 

vitro in a host cell, preferably in a host cell previously harvested from the animal to be treated and 
more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has been 
transformed with the vector coding for the desired hGGPS polypeptide or the desired C-terminal 
fragment thereof is reintroduced into the animal body in order to deliver the recombinant protein 
35 within the body either locally or systemically. 

hi one specific embodiment, the vector is derived from an adenovirus. Preferred adenovirus 
vectors according to the invention are those described by Feldman and Steg (1996) or Ohno et al. 
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(1994). Another preferred recombinant adenovirus according to this specific embodiment of the 
present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal 
origin ( French patent application N° FR-93 .05954). 

Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 
5 recombinant gene delivery system of choice for the transfer of exogenous polynucleotides in vivo , 
particularly to mammals, including humans. These vectors provide efficient delivery of genes into 
cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host 

Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or 
in vitro gene delivery vehicles of the present invention include retroviruses selected from the group 

1 0 consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus 
and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include the 4070A and the 
1504A viruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No VR- 
590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR-190; 
PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include Bryan 

1 5 high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Other preferred retroviral 
vectors are those described in Roth et al. (1996), the PCT Application No WO 93/25234, the PCT 
Application No WO 94/ 06920, Roux et al., 1989, Man et al., 1992 and Neda et al., 1991. 

Yet another viral vector system that is contemplated by the invention consists in the adeno- 
associated virus (AAV). The adeno-associated virus is a naturally occurring defective virus that 

20 requires another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient 

replication and a productive life cycle (Muzyczka et al., 1992). It is also one of the few viruses that 
may integrate its DNA into non-dividing cells, and exhibits a high frequency of stable integration 
(Flotte et al., 1992; Samulski et al., 1989; McLaughlin et al., 1989). One advantageous feature of 
AAV derives from its reduced efficacy for transducing primary cells relative to transformed cells. 

25 Other compositions containing a vector of the invention advantageously comprise an 

oligonucleotide fragment of a nucleic sequence selected from the group consisting of SEQ ID Nos 2 
or 3 as an antisense tool that inhibits the expression of the corresponding hGGPS gene. Preferred 
methods using antisense polynucleotide according to the present invention are the procedures 
described by Sczakiel et al. (1995) or also in the PCT Application No WO 95/24223. 

30 Preferably, the antisense tools are chosen among the polynucleotides (1 5-200 bp long) that 

are complementary to the 5'end of the hGGPS mRNAs. In another embodiment, a combination of 
different antisense polynucleotides complementary to different parts of the desired targeted gene are 
used. 

Preferred antisense polynucleotides according to the present invention are complementary to 
35 a sequence of the mRNAs of hGGPS that contains the translation initiation codon ATG. 
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Another object of the invention consists in cell host that have been transformed or 
transfected with one of the polynucleotides described therein, and more precisely a polynucleotide 
either comprising a hGGPS regulatory polynucleotide or the coding sequence of the hGGPS 
5 polypeptide having the amino acid sequence of SEQ ID No 4. Are included cell hosts that are 
transformed (prokaryotic cells) or that are transfected (eukaryotic cells) with a recombinant vector 
such as those described above. 

A cell host according to the present invention is characterized in that its genome or genetic 
background (including chromosome, plasmids) is modified by the heterologous nucleic acid coding 
1 0 for the hGGPS polypeptide of SEQ ID No 4. 

More particularly, the cell hosts of the present invention can comprise any of the 
polynucleotides described in "hGGPPS cDNA Sequences" section, the "Coding Regions" section, 
"Genomic sequences" section and the "Oligonucleotide Probes And Primers" section. 

Preferred cell hosts used as recipients for the expression vectors of the invention are the 
15 following : 

a) Prokaryotic host cells : Escherichia coli strains (I.E. DH5-a strain) or Bacillus subtilis. 

b) Eukaryotic host cells : HeLa cells (ATCC N°CCL2; N°CCL2.I; N°CCL2.2), Cv 1 cells 
(ATCCN°CCL70), COS cells (ATCC N°CRL 1650; N°CRL1651), Sf-9 cells (ATCC N°CRL171 1). 

The constructs in the host cells can be used in a conventional manner to produce the gene 
20 product encoded by the recombinant sequence. 

Following transformation of a suitable host and growth of the host to an appropriate cell 
density, the selected promoter is induced by appropriate means, such as temperature shift or 
chemical induction, and cells are cultivated for an additional period. 

Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and 
25 the resulting crude extract retained for further purification. 

Microbial cells employed in expression of proteins can be disrupted by any convenient 
method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
agents. Such methods are well known by the skill artisan. 

Cell hosts can be used to generate transgenic animals. Therefore, the invention concerns a 
30 non-human host animal or mammal comprising a recombinant vector or a host cell according to the 
invention. More particularly, the invention concerns a mammalian host cell or a non-human host 
mammal comprising a hGGPPS gene disrupted by homologous recombination with a knock out 
vector and comprising a polynucleotide according to the invention. 

hGGPPS Proteins and Polypeptide Fragments: 

35 The term "hGGPPS polypeptides" is used herein to embrace all of the proteins and 

polypeptides of the present invention. Also forming part of the invention are polypeptides encoded 
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by the polynucleotides of the invention, as well as fusion polypeptides comprising such 
polypeptides. The invention embodies hGGPPS proteins from humans, including isolated or 
purified hGGPPS proteins consisting, consisting essentially, or comprising the sequence of SEQ ID 
No 4. It should be noted the hGGPPS proteins of the invention are based on the naturally-occurring 
5 variant of the ammo acid sequence of human hGGPPS, wherein a phenylalanine residue is at 
positions 204, 257, 295 of SEQ ID No 4, a cysteine residue is at position 205 of SEQ ID No 4, a 
proline residue is at position 225 of SEQ ID No 4, and a glutamic acid residue is at position 252 of 
SEQ ID No 4. 

The present invention embodies isolated, purified, and recombinant polypeptides comprising 

10 a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably 
at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 4, wherein said contiguous span 
includes at least one amino acid selected from the group consisting of a Phe at positions 204, 257, 
295 of SEQ ID No 4, a Cys at position 205 of SEQ ID No 4, a Pro at position 225 of SEQ ID No 4, 
and a Glu at position 252 of SEQ ID No 4. Li other preferred embodiments the contiguous stretch of 

15 amino acids comprises the site of a mutation or functional mutation, including a deletion, addition, 
swap or truncation of the amino acids in the hGGPPS protein sequence. 

hGGPPS proteins are preferably isolated from human or mammalian tissue samples or 
expressed from human or mammalian genes. The hGGPPS polypeptides of the invention can be 
made using routine expression methods known in the art. The polynucleotide encoding the desired 

20 polypeptide, is ligated into an expression vector suitable for any convenient host. Both eukaryotic 
and prokaryotic host systems is used in forming recombinant polypeptides, and a summary of some 
of the more common systems. The polypeptide is then isolated from lysed cells or from the culture 
medium and purified to the extent needed for its intended use. Purification is by any technique 
known in the art, for example, differential extraction, salt fractionation, chromatography, 

25 centrifugation, and the like. See, for example, Methods in Enzymology for a variety of methods for 
purifying proteins. 

In addition, shorter protein fragments is produced by chemical synthesis. Alternatively the 
proteins of the invention is extracted from cells or tissues of humans or non-human animals. 
Methods for purifying proteins are known in the art, and include the use of detergents or chaotropic 
30 agents to disrupt particles followed by differential extraction and separation of the polypeptides by 
ion exchange chromatography, affinity chromatography, sedimentation according to density, and gel 
electrophoresis. 

Any hGGPPS cDNA, including SEQ ID Nos 2 and 3, is used to express hGGPPS proteins and 
polypeptides. The nucleic acid encoding the hGGPPS protein or polypeptide to be expressed is 
35 operably linked to a promoter in an expression vector using conventional cloning technology. The 

hGGPPS insert in the expression vector may comprise the full coding sequence for the hGGPPS protein 
or a portion thereof. For example, the hGGPPS derived insert may encode a polypeptide comprising at 
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least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 
30, 40, 50, or 100 consecutive amino acids of the hGGPPS protein of SEQ ID No 4, wherein said 
consecutive amino acids comprise at least one amino acid selected from the group consisting of a Phe 
at positions 204, 257, 295 of SEQ ID No 4, a Cys at position 205 of SEQ ID No 4. a Pro at position 
5 225 of SEQ ID No 4, and a Glu at position 252 of SEQ ID No 4. 

The expression vector is any of the mammalian, yeast, insect or bacterial expression systems 
known in the art. Commercially available vectors and expression systems are available from a variety 
of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega 
(Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and 
10 facilitate proper protein folding, the codon context and codon pairing of the sequence is optimized for 
the particular expression organism in which the expression vector is introduced, as explained by 
Hatfield, et al, U.S. Patent No. 5,082,767, the disclosures of which are incorporated by reference herein 
in their entirety. 

In one embodiment, the entire coding sequence of the hGGPPS cDNA through the poly A 

15 signal of the cDNA are operably linked to a promoter in the expression vector. Alternatively, if the 
nucleic acid encoding a portion of the hGGPPS protein lacks a methionine to serve as the initiation site, 
an initiating methionine can be introduced next to the first codon of the nucleic acid using conventional 
techniques. Similarly, if the insert from the hGGPPS cDNA lacks a poly A signal, this sequence can be 
added to the construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using 

20 Bgll and Sail restriction endonuclease enzymes and incorporating it into the mammalian expression 
vector pXTl (Stratagene). pXTl contains the LTRs and a portion of the gag gene from Moloney 
Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. 
The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene. 
The nucleic acid encoding the hGGPPS protein or a portion thereof is obtained by PCR from a bacterial 

25 vector containing the hGGPPS cDNA of SEQ ID Nos 2 and 3 using oligonucleotide primers 
complementary to the hGGPPS cDNA or portion thereof and containing restriction endonuclease 
sequences for Pst I incorporated into the 5 'primer and Bgin at the 5' end of the corresponding cDNA 3' 
primer, taking care to ensure that the sequence encoding the hGGPPS protein or a portion thereof is 
positioned properly with respect to the poly A signal. The purified fragment obtained from the resulting 

30 PCR reaction is digested with PstI, blunt ended with an exonuclease, digested with Bgl n, purified and 
ligated to pXTl, now containing a poly A signal and digested with Bglll. 

The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life 
Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. 
Positive transfectants are selected after growing the transfected cells in 600ug/ml G41 8 (Sigma,, St. 

35 Louis, Missouri). 

The above procedures may also be used to express a mutant hGGPPS protein responsible for a 
detectable phenotype or a portion thereof. 
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The expressed protein is purified using conventional purification techniques such as ammonium 
sulfate precipitation or chromatographic separation based on size or charge. The protein encoded by the 
nucleic acid insert may also be purified using standard immunochromatography techniques. In such 
procedures, a solution containing the expressed hGGPPS protein or portion thereof, such as a cell 
5 extract, is applied to a column having antibodies against the hGGPPS protein or portion thereof is 
attached to the chromatography matrix. The expressed protein is allowed to bind the 
immunochromatography column. Thereafter, the column is washed to remove non-specifically bound 
proteins. The specifically bound expressed protein is then released from the column and recovered 
using standard techniques. 

10 To confirm expression of the hGGPPS protein or a portion thereof, the proteins expressed from 

host cells containing an expression vector containing an insert encoding the hGGPPS protein or a 
portion thereof can be compared to the proteins expressed in host cells containing the expression vector 
without an insert. The presence of a band in samples from cells containing the expression vector with 
an insert which is absent in samples from cells containing the expression vector without an insert 

1 5 indicates that the hGGPPS protein or a portion thereof is being expressed. Generally, the band will 
have the mobility expected for the hGGPPS protein or portion thereof. However, the band may have a 
mobility different than that expected as a result of modifications such as glycosylation, ubiquitination, 
or enzymatic cleavage. 

Antibodies capable of specifically recognizing the expressed hGGPPS protein or a portion 

20 thereof are described below. 

If antibody production is not possible, the nucleic acids encoding the hGGPPS protein or a 
portion thereof is incorporated into expression vectors designed for use in purification schemes 
employing chimeric polypeptides. In such strategies the nucleic acid encoding the hGGPPS protein or a 
portion thereof is inserted in frame with the gene encoding the other half of the chimera. The other half 

25 of the chimera is P-globin or a nickel binding polypeptide encoding sequence. A chromatography 

matrix having antibody to P-globin or nickel attached thereto is then used to purify the chimeric protein. 
Protease cleavage sites is engineered between the P-globin gene or the nickel binding polypeptide and 
the hGGPPS protein or portion thereof. Thus, the two polypeptides of the chimera is separated from 
one another by protease digestion, 

30 One useful expression vector for generating P-globin chimeric proteins is pSG5 (Stratagene), 

which encodes rabbit P-globin. Intron II of the rabbit p-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of 
expression. These techniques are well known to those skilled in the art of molecular biology. Standard 
methods are published in methods texts such as Davis et al., (1986) and many of the methods are 

35 available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be 

produced from the construct using in vitro translation systems such as the In vitro Express™ Translation 
Kit (Stratagene). 
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Antibodies That Bind hGGPPS Polypeptides of the Invention 

Any hGGPPS polypeptide or whole protein may be used to generate antibodies capable of 
specifically binding to an expressed hGGPPS protein or fragments thereof as described. 

One antibody composition of the invention is capable of specifically binding or specifically 
5 bind to the variant of the hGGPPS protein of SEQ ED No 4. For an antibody composition to 
specifically bind to a first variant of hGGPPS, it must demonstrate at least a 5%, 10%, 15%, 20%, 
25%, 50%, or 100% greater binding affinity for a full length first variant of the hGGPPS protein 
than for a full length second variant of the hGGPPS protein in an ELISA, RIA, or other antibody- 
based binding assay. 

10 In a preferred embodiment of polyclonal or monoclonal antibodies of the invention consists 

in antibodies raised against a C-terminal portion of the hGGPS polypeptide of the amino acid 
sequence of SEQ ID No 4, more preferably antibodies raise against a peptide fragment of the 
hGGPS polypeptide having the amino acid sequence starting from the amino acid at position 200 
and ending at the amino acid in position 300 of the hGGPS polypeptide of SEQ ID No 4, or peptide 

1 5 fragments thereof. 

In a preferred embodiment, the invention concerns antibody compositions, either polyclonal 
or monoclonal, capable of selectively binding, or selectively bind to an epitope-containing a 
polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino 
acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 4 , 

20 wherein said epitope comprises at least one amino acid selected from the group consisting of a Phe 
at positions 204, 257, 295 of SEQ ID No 4, a Cys at position 205 of SEQ ED No 4, a Pro at position 
225 of SEQ ID No 4, and a Glu at position 252 of SEQ ED No 4. 

The invention also concerns a purified or isolated antibody capable of specifically binding to 
a mutated hGGPPS protein or to a fragment or variant thereof comprising an epitope of the mutated 

25 hGGPPS protein. 

In a preferred embodiment, the invention concerns the use in the manufacture of antibodies 
of a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ED No 4 , 
wherein said epitope comprises at least one amino acid selected from the group consisting of a Phe 
30 at positions 204, 257, 295 of SEQ ID No 4, a Cys at position 205 of SEQ ID No 4, a Pro at position 
225 of SEQ ED No 4, and a Glu at position 252 of SEQ ID No 4. 

Non-human animals or mammals, whether wild-type or transgenic, which express a different 
species of hGGPPS than the one to which antibody binding is desired, and animals which do not 
express hGGPPS (i.e. a hGGPPS knock out animal as described herein) are particularly useful for 
35 preparing antibodies. hGGPPS knock out animals will recognize all or most of the exposed regions 
of a hGGPPS protein as foreign antigens, and therefore produce antibodies with a wider array of 
hGGPPS epitopes. Moreover, smaller polypeptides with only 10 to 30 amino acids may be useful in 
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obtaining specific binding to any one of the hGGPPS proteins. In addition, the humoral immune 
system of animals which produce a species of hGGPPS that resembles the antigenic sequence will 
preferentially recognize the differences between the animal's native hGGPPS species and the 
antigen sequence, and produce antibodies to these unique sites in the antigen sequence. Such a 
5 technique will be particularly useful in obtaining antibodies that specifically bind to any one of the 
hGGPPS proteins. 

Antibody preparations prepared according to either protocol are useful in quantitative 
immunoassays which determine concentrations of antigen-bearing substances in biological samples; 
they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological 
1 0 sample. The antibodies may also be used in therapeutic compositions for killing cells expressing the 
protein or reducing the levels of the protein in the body. 

The antibodies of the invention may be labeled by any one of the radioactive, fluorescent or 
enzymatic labels known in the art. 

Consequently, the invention is also directed to a method for detecting specifically the 
15 presence of a hGGPPS polypeptide according to the invention in a biological sample, said method 
comprising the following steps : 

a) bringing into contact the biological sample with a polyclonal or monoclonal antibody that 
specifically binds a hGGPPS polypeptide comprising an amino acid sequence of SEQ ED No 4, or to 
a peptide fragment or variant thereof; and 
20 b) detecting the antigen-antibody complex formed. 

The invention also concerns a diagnostic kit for detecting in vitro the presence of a hGGPPS 
polypeptide according to the present invention in a biological sample, wherein said kit comprises: 

a) a polyclonal or monoclonal antibody that specifically binds a hGGPPS polypeptide 
comprising an amino acid sequence of SEQ ID No 4, or to a peptide fragment or variant thereof, 

25 optionally labeled; 

b) a reagent allowing the detection of the antigen-antibody complexes formed, said reagent 
carrying optionally a label, or being able to be recognized itself by a labeled reagent, more 
particularly in the case when the above-mentioned monoclonal or polyclonal antibody is not labeled 
by itself. 

30 Method For Screening Ligands That Modulate The Expression Of The 

hGGPPS Gene. 

Another subject of the present invention is a method for screening molecules that modulate 
the expression of the hGGPPS protein. Such a screening method comprises the steps of: 

a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a nucleotide 
35 sequence encoding the hGGPPS protein or a variant or a fragment thereof, placed under the control 
of its own promoter; 
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b) bringing into contact the cultivated cell with a molecule to be tested; 

c) quantifying the expression of the hGGPPS protein or a variant or a fragment thereof. 

In an embodiment, the nucleotide sequence encoding the hGGPPS protein or a variant or a 
fragment thereof, preferably a fragment comprising an allele of the biallelic marker 5-187-77, and 
5 the complement thereof. 

In one embodiment of the invention, the method for the screening of a candidate substance 
or molecule modulating the expression of the hGGPS genecomprises the following steps : 

a) providing a recombinant host cell expressing a nucleic acid, wherein said nucleic acid 
comprises a nucleotide sequence selected from the group consisting of SEQ ID Nos 1, 2 and 3 or a 

1 0 fragment thereof; 

b) obtaining a candidate substance, and 

c) determining the ability of the candidate substance to modulate the expression levels of the 
nucleotide sequence selected from the group consisting of SEQ ID Nos 1, 2 and 3 or a fragment 
thereof 

15 Using DNA recombination techniques well known by the one skill in the art, the hGGPPS 

protein encoding DNA sequence is inserted into an expression vector, downstream from its promoter 
sequence. As an illustrative example, the promoter sequence of the hGGPPS gene is contained in 
the nucleic acid of the 5* regulatory region. 

The quantification of the expression of the hGGPPS protein may be realized either at the 
20 rnRNA level or at the protein level. In the latter case, polyclonal or monoclonal antibodies may be 
used to quantify the amounts of the hGGPPS protein that have been produced, for example in an 
ELISA or a RIA assay. 

In a preferred embodiment, the quantification of the hGGPPS mRNA is realized by a 
quantitative PCR amplification of the cDNA obtained by a reverse transcription of the total mRNA 
25 of the cultivated hGGPPS -transfected host cell, using a pair of primers specific for hGGPPS. 

The present invention also concerns a method for screening substances or molecules that are 
able to increase, or in contrast to decrease, the level of expression of the hGGPPS gene. Such a 
method may allow the one skilled in the art to select substances exerting a regulating effect on the 
expression level of the hGGPPS gene and which may be useful as active ingredients included in 
30 pharmaceutical compositions. 

Thus, is also part of the present invention a method for screening of a candidate substance or 
molecule that modulated the expression of the hGGPPS gene, this method comprises the following 
steps: 

- providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
35 comprises a nucleotide sequence of the 5 s regulatory region or a biologically active fragment or 

variant thereof located upstream a polynucleotide encoding a detectable protein; 

- obtaining a candidate substance; and 
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- determining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 

In a further embodiment, the nucleic acid comprising the nucleotide sequence of the 5' 
regulatory region or a biologically active fragment or variant thereof also includes a 5'UTR region 
5 of the hGGPPS cDNA of SEQ ID Nos 2 or 3, or one of its biologically active fragments or variants 
thereof. 

Among the preferred polynucleotides encoding a detectable protein, there may be cited 
polynucleotides encoding beta galactosidase, green fluorescent protein (GFP) and chloramphenicol 
acetyl transferase (CAT). 

10 The invention also pertains to kits useful for performing the herein described screening 

method. Preferably, such kits comprise a recombinant vector that allows the expression of a 
nucleotide sequence of the 5 5 regulatory region or a biologically active fragment or variant thereof 
located upstream and operably linked to a polynucleotide encoding a detectable protein or the 
hGGPPS protein or a fragment or a variant thereof. 

15 In another embodiment of a method for the screening of a candidate substance or molecule 

that modulates the expression of the hGGPPS gene, wherein said method comprises the following 
steps: 

a) providing a recombinant host cell containing a nucleic acid, wherein said nucleic acid 
comprises a 5'UTR sequence of the hGGPPS cDNA of SEQ ID Nos 2 or 3, or one of its biologically 

20 active fragments or variants, the 5'UTR sequence or its biologically active fragment or variant being 
operably linked to a polynucleotide encoding a detectable protein; 

b) obtaining a candidate substance; and 

c) determining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 

25 In a specific embodiment of the above screening method, the nucleic acid that comprises a 

nucleotide sequence selected from the group consisting of the 5'UTR sequence of the hGGPPS 
cDNA of SEQ ID Nos 2 or 3 or one of its biologically active fragments or variants, includes a 
promoter sequence which is endogenous with respect to the hGGPPS 5'UTR sequence. 

In another specific embodiment of the above screening method, the nucleic acid that 

30 comprises a nucleotide sequence selected from the group consisting of the 5*UTR sequence of the 
hGGPPS cDNA of SEQ ID Nos 2 or 3 or one of its biologically active fragments or variants, 
includes a promoter sequence which is exogenous with respect to the hGGPPS 5'UTR sequence 
defined therein. 

In a further preferred embodiment, the nucleic acid comprising the 5'-UTR sequence of the 
35 hGGPPS cDNA or SEQ ID Nos 2 or 3 or the biologically active fragments thereof, preferably those 
including the biallelic marker 5-1 87-77 or the complement thereof. 
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The invention further deals with a kit for the screening of a candidate substance modulating 
the expression of the hGGPPS gene, wherein said kit comprises a recombinant vector that comprises 
a nucleic acid including a 5 J UTR sequence of the hGGPPS cDNA of SEQ ID Nos 2 or 3, or one of 
their biologically active fragments or variants, the 5'UTR sequence or its biologically active 
5 fragment or variant being operably linked to a polynucleotide encoding a detectable protein. 

For the design of suitable recombinant vectors useful for performing the screening methods 
described above, it will be referred to the section of the present specification wherein the preferred 
recombinant vectors of the invention are detailed. 

Expression levels and patterns of hGGPPS may be analyzed by solution hybridization with 

10 long probes as described in International Patent Application No. WO 97/05277, the entire contents 
of which are incorporated herein by reference. Briefly, the hGGPPS cDNA or the hGGPPS 
genomic DNA described above, or fragments thereof, is inserted at a cloning site immediately 
downstream of a bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produce antisense 
RNA. Preferably, the hGGPPS insert comprises at least 100 or more consecutive nucleotides of the 

15 genomic DNA sequence or the cDNA sequences. The plasmid is linearized and transcribed in the 
presence of ribonucleotides comprising modified ribonucleotides (i.e. biotin-UTP and DIG-UTP). 
An excess of this doubly labeled RNA is hybridized in solution with rnRNA isolated from cells or 
tissues of interest. The hybridization is performed under standard stringent conditions (40-50°C for 
16 hours in an 80% formamide, 0. 4 M NaCl buffer, pH 7-8). The unhybridized probe is removed 

20 by digestion with ribonucleases specific for single-stranded RNA (i.e. RNases CL3, Tl , Phy M, U2 
or A). The presence of the biotin-UTP modification enables capture of the hybrid on a 
microtitration plate coated with streptavidin. The presence of the DIG modification enables the 
hybrid to be detected and quantified by ELISA using an anti-DIG antibody coupled to alkaline 
phosphatase. 

25 Quantitative analysis of hGGPPS gene expression may also be performed using arrays. As 

used herein, the term array means a one dimensional, two dimensional, or multidimensional 
arrangement of a plurality of nucleic acids of sufficient length to permit specific detection of 
expression of mRNAs capable of hybridizing thereto. For example, the arrays may contain a 
plurality of nucleic acids derived from genes whose expression levels are to be assessed. The arrays 

30 may include the hGGPPS genomic DNA, the hGGPPS cDNA sequences or the sequences 

complementary thereto or fragments thereof, particularly those comprising the biallelic marker 5- 
187-77. Preferably, the fragments are at least 15 nucleotides in length. In other embodiments, the 
fragments are at least 25 nucleotides in length. In some embodiments, the fragments are at least 50 
nucleotides in length. More preferably, the fragments are at least 100 nucleotides in length. In 

35 another preferred embodiment, the fragments are more than 1 00 nucleotides in length. In some 
embodiments the fragments may be more than 500 nucleotides in length. 
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For example, quantitative analysis of hGGPPS gene expression may be performed with a 
complementary DNA microarray as described by Schena et al.(1995 and 1996). Full length 
hGGPPS cDNAs or fragments thereof are amplified by PCR and arrayed from a 96-well microliter 
plate onto silylated microscope slides using high-speed robotics. Printed arrays are incubated in a 
5 humid chamber to allow rehydration of the array elements and rinsed, once in 0. 2% SDS for 1 min, 
twice in water for 1 min and once for 5 min in sodium borohydride solution. The arrays are 
submerged in water for 2 min at 95°C. transferred into 0. 2% SDS for 1 min, rinsed twice with 
water, air dried and stored in the dark at 25°C. 

Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a 

10 single round of reverse transcription. Probes are hybridized to 1 cm 2 microarrays under a 14 x 14 
mm glass coverslip for 6-12 hours at 60°C. Arrays are washed for 5 min at 25°C in low stringency 
wash buffer (1 x SSC/0. 2% SDS), then for 10 min at room temperature in high stringency wash 
buffer (0. 1 x SSC/0. 2% SDS). Arrays are scanned in 0. 1 x SSC using a fluorescence laser 
scanning device fitted with a custom filter set. Accurate differential expression measurements are 

1 5 obtained by taking the average of the ratios of two independent hybridizations. 

Quantitative analysis of hGGPPS gene expression may also be performed with full length 
hGGPPS cDNAs or fragments thereof in complementary DNA arrays as described by Pietu et 
al.(1996). The full length hGGPPS cDNA or fragments thereof is PCR amplified and spotted on 
membranes. Then, mRNAs originating from various tissues or cells are labeled with radioactive 

20 nucleotides. After hybridization and washing in controlled conditions, the hybridized mRNAs are 
detected by phospho-imaging or autoradiography. Duplicate experiments are performed and a 
quantitative analysis of differentially expressed mRNAs is then performed. 

Alternatively, expression analysis using the hGGPPS genomic DNA, the hGGPPS cDNA, 
or fragments thereof can be done through high density nucleotide arrays as described by Lockhart et 

25 al.(1996) and Sosnowsky et al.(l997). Oligonucleotides of 15-50 nucleotides from the sequences of 
the hGGPPS genomic DNA, the hGGPPS cDNA sequences particularly those comprising the 
biallelic marker 5-187-77, or the sequences complementary thereto, are synthesized directly on the 
chip (Lockhart et al., supra) or synthesized and then addressed to the chip (Sosnowski et al., supra). 
Preferably, the oligonucleotides are about 20 nucleotides in length. 

30 hGGPPS cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin 

or fluorescent dye, are synthesized from the appropriate mRNA population and then randomly 
fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to the 
chip. After washing as described in Lockhart et al., supra and application of different electric fields 
(Sosnowsky et al., 1997)., the dyes or labeling compounds are detected and quantified. Duplicate 

35 hybridizations are performed. Comparative analysis of the intensity of the signal originating from 
cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential 
expression of hGGPPS mRNA. 
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Throughout this application, various publications, patents and published patent applications 
are cited. The disclosures of these publications, patents and published patent specification 
referenced in this application are hereby incorporated by reference into the present disclosure to 
5 more fully describe the sate of the art to which this invention pertains. 

EXAMPLES 
Example 1 : 

Analysis of the mRNAs encoding the hGGFS polypeptide of SEQ ID No 4 
synthesized by the cells. 

10 Human GGPS cDNA was obtained as follows : 4ul of ethanol suspension containing 1 mg 

of human prostate total RNA (Clontech laboratories, Inc., Palo Alto, USA; Catalogue N. 64038-1) 
was centrifuged, and the resulting pellet was air dried for 30 minutes at room temperature. 

First strand cDNA synthesis was performed using the AdvantageTM RT-for- PCR kit 
(Clontech laboratories Inc., catalogue N. Kl 402-1). 1 \i\ of 20 mM solution of a specific oligo dT 

15 primer was added to 12.5 ul of RNA solution in water, heated at 74°C for 2.5 min and rapidly 
quenched in an ice bath. 10 ul of 5 x RT buffer (50 mM Tris-HCl, pH 8.3, 75 mM KC1, 3 mM 
MgCl 2 ), 2.5 ul of dNTP mix (10 mM each), 1.25 ul of human recombinant placental RNA inhibitor 
were mixed with 1 ml of MMLV reverse transcriptase (200 units). 6.5 ul of this solution were added 
to RNA-primer mix and incubated at 42°C for one hour. 80 ul of water were added and the solution 

20 was incubated at 94°C for 5 minutes. 

5pxi of the resulting solution were used in a Long Range PCR reaction with hot start, in 50 ul 
final volume, using 2 units of rtTHXL, 20 pmol/ul of each of 5'- 

TGGAGAAGACTC A AG AAAC AGTCC AAA-3 ' (from the nucleotide in position 86 to the 
nucleotide in position 112 of SEQ ID No 1) and 5 *-CCTGGAAGC A AGTCTTTTTT ATTGACG-3 ' 
25 (from the nucleotide in position 1285 to the nucleotide in position 1 3 1 1 of SEQ ID No 1) primers 
with 35 cycles of elongation for 6 minutes at 67°C in thermocycler. 

The amplification products corresponding to both cDNA strands are partially sequenced in 
order to ensure the specificity of the amplification reaction. 

Results of Northern blot analysis of prostate mRNAs support the existence of a hGGPS 
30 cDNA which corresponds to the nucleotide sequence of SEQ ID No 1 . 
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Example 2 : 

Detection of hGGPS biallelic markers: DN A extraction 

Donors were unrelated and healthy. They presented a sufficient diversity for being 
representative of a French heterogeneous population. The DNA from 100 individuals was extracted 
5 and tested for the detection of the biallelic markers. 

30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. 
Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed by 
a lysis solution (50 ml fmal volume : 10 mM Tris pH7.6; 5 mM MgCl 2 ; 10 mM NaCl). The solution 
was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the residual red 
10 cells present in the supernatant, after resuspension of the pellet in the lysis solution. 

The pellet of white cells was lysed overnight at 42°C with 3.7 ml of lysis solution composed 

of: 

- 3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM) / NaCl 0.4 M 
-200 ul SDS 10% 

15 - 500 ul K-proteinase (2 mg K-protemase in TE 10-2 / NaCl 0.4 M). 

For the extraction of proteins, 1 ml saturated NaCl (6M) (1/3.5 v/v) was added. After 
vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous 
supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was 
20 rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 2000 rpm. 
The pellet was dried at 37°C, and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA 
concentration was evaluated by measuring the OD at 260 nm (1 unit OD = 50 ug/ml DNA). 

To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio was 
determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1.8 and 2 were used 
25 in the subsequent examples described below. 

The pool was constituted by mixing equivalent quantities of DNA from each individual. 


Example 3 : 

Detection of the biallelic markers: amplification of genomic DNA by PCR 

The amplification of specific genomic sequences of the DNA samples of example 2 was 
30 carried out on the pool of DNA obtained previously. In addition, 50 individual samples were 
similarly amplified. 

PCR assays were performed using the following protocol: 
Final volume 25 uj 

DNA 2 ng/ul 

35 MgCl 2 2 mM 
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dNTP (each) 200 uM 

primer (each) 2.9 ng/ui 

Ampli Taq Gold DNA polymerase 0.05 unit/ul 

PCR buffer (lOx - 0.1 M TrisHCl pH8.3 0.5M KCl) lx 
5 Each pair of first primers was designed using the sequence information of the hGGPS gene 

disclosed herein and the OSP software (Hillier & Green, 1991). This first pair of primers was about 
20 nucleotides in length and had the sequences disclosed in Table I in the columns labeled PU and 
RP. 


Table 1 


Amplicon 

Position range of the 
amplicon in SEQ ID 
genomic 

Position range of 
amplification primer in SEQ 
ID No genomic 

Complementary position 
range of amplification primer 
in SEQ ID No genomic 

5-187 

13982-14409 

13982-14000 

14390-14409 


10 The sequences of the amplification primers Bl and CI are respectively disclosed in SEQ ID 

Nos 8 and 9. 


Preferably, the primers contained a common oligonucleotide tail upstream of the specific 
bases targeted for amplification which was useful for sequencing. Primers PU contain the following 
additional PU 5* sequence : TGTAAAACGACGGCCAGT (SEQ ID No 1 0); primers RP contain the 
15 following RP 5' sequence : CAGGAAACAGCTATGACC (SEQ ID No 1 1). 

The synthesis of these primers was performed following the phosphoramidite method, on a 
GENSET UFPS 24.1 synthesizer. 

DNA amplification was performed on a Genius II thermocycler. After heating at 95 °C for 10 
min, 40 cycles were performed. Each cycle comprised: 30 sec at 95°C. 54°C for 1 min, and 30 sec at 
20 72°C. For final elongation, 1 0 min at 72°C ended the amplification. The quantities of the 

amplification products obtained were determined on 96-well microliter plates, using a fluorometer 
and Picogreen as intercalant agent (Molecular Probes). 


Example 4 : 

Detection of the biallelic markers: sequencing of amplified genomic DNA and 
25 identification of polymorphisms. 

The sequencing of the amplified DNA obtained in example 3 was carried out on ABI 377 
sequencers. The sequences of the amplification products were determined using automated dideoxy 
terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of 
the sequencing reactions were run on sequencing gels and the sequences were determined using gel 
30 image analysis. 

The sequence data were further evaluated to detect the presence of biallelic markers among 
the pooled amplified fragments. The polymorphism search was based on the presence of 
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superimposed peaks in the electrophoresis pattern resulting from different bases occurring at the 
same position as described previously. 

Table 2 shows the bialleiic marker that has been detected after the sequence analysis of the 
amplification fragments generated by PCR. 


5 Table 2 


Ampli 

Marker 

Localization 

Polymorphism 

BM position 

Position of a probe in 

con 

Name 

in hGGPPS 


in SEQID1 

SEQ ID No 1 



gene 




5-187 

5-187-77 

Intron 3 

Insertion T 

14058 

14036-14081 | 


The two alleles of the bialleiic marker 5-187-77 can be defined by an oligonucleotide 
comprising the polymorphic base. The sequence of such oligonucleotides are disclosed in SEQ ID 
Nos 5 and 6. 

Example 5 : 

Validation of the polymorphisms through microsequencing 

The bialleiic marker identified in example 4 was further confirmed through 
microsequencing. Microsequencing was carried out for each individual DNA sample described in 
Example 2. 

Amplification from genomic DNA of individuals was performed by PCR as described above 
for the detection of the bialleiic markers with the same set of PCR primers (Table 1). 

The preferred primers used in microsequencing were about 20 nucleotides in length and 
hybridized just upstream of the considered polymorphic base. According to the invention, the primer 
used in microsequencing is detailed in Table 3. 

Table 3 


Marker Name 

Microsequencing primer 

5-187-77 1 

SEQ ID No 7 


The microsequencing reaction was performed as follows : 

After purification of the amplification products, the microsequencing reaction mixture was 
prepared by adding, in a 20u1 final volume: 1 0 pmol microsequencing oligonucleotide, 1 U 

25 Thermosequenase (Amersham E79000G), 1 -25 ul Thermosequenase buffer (260 mM Tris HC1 pH 
9.5, 65 mM MgCl 2 ), and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye Terminator Set 
401095) complementary to the nucleotides at the polymorphic site of each bialleiic marker tested, 
following the manufacturer's recommendations. After 4 minutes at 94°C, 20 PCR cycles of 15 sec at 
55°C, 5 sec at 72°C, and 10 sec at 94°C were carried out in a Tetrad PTC-225 thermocycler (MJ 

30 Research). The unincorporated dye terminators were then removed by ethanol precipitation. Samples 
were finally resuspended in formamide-EDTA loading buffer and heated for 2 min at 95°C before 
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being loaded on a polyacrylamide sequencing gel. The data were collected by an ABI PRISM 377 
DNA sequencer and processed using the GENESCAN software (Perkin Elmer). 

Following gel analysis, data were automatically processed with software that allows the 
determination of the alleles of biallelic markers present in each amplified fragment. 
5 The software evaluates such factors as whether the intensities of the signals resulting from 

the above micro sequencing procedures are weak, normal, or saturated, or whether the signals are 
ambiguous. In addition, the software identifies significant peaks (according to shape and height 
criteria). Among the significant peaks, peaks corresponding to the targeted site are identified based 
on their position. When two significant peaks are detected for the same position, each sample is 
10 categorized classification as homozygous or heterozygous type based on the height ratio. 

Example 6 : 

Preparation of Antibody Compositions to the GENE protein 

Substantially pure protein or polypeptide is isolated from transfected or transformed cells 
containing an expression vector encoding the hGGPPS protein or a portion thereof. The concentration 
1 5 of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, 
to the level of a few micrograrns/ml. Monoclonal or polyclonal antibody to the protein can then be 
prepared as follows: 

A. Monoclonal Antibody Production by Hvbridoma Fusion 

Monoclonal antibody to epitopes in the hGGPPS protein or a portion thereof can be prepared 
20 from murine hybridomas according to the classical method of Kohler, G. and Milstein, C, (1975) or 
derivative methods thereof Also see Harlow, E. ? and D. Lane. 1988.. 

Briefly, a mouse is repetitively inoculated with a few micrograms of the hGGPPS protein or a 
portion thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody producing 
cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse 
25 myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media 
comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the 
dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody- 
producing clones are identified by detection of antibody in the supernatant fluid of the wells by 
immunoassay procedures, such as ELISA, as originally described by Engvall, (1980), and derivative 
30 methods thereof. Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et 
al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2. 

B. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogeneous epitopes in the hGGPPS 
35 protein or a portion thereof can be prepared by immunizing suitable non-human animal with the 
hGGPPS protein or a portion thereof, which can be unmodified or modified to enhance 
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immunogenic! ty. A suitable non-human animal is preferably a non-human mammal is selected, 
usually a mouse, rat, rabbit, goat, or horse. Alternatively, a crude preparation which has been 
enriched for hGGPPS concentration can be used to generate antibodies. Such proteins, fragments or 
preparations are introduced into the non-human mammal in the presence of an appropriate adjuvant 
5 (e.g. aluminum hydroxide, R3BI, etc.) which is known in the art. In addition the protein, fragment or 
preparation can be pretreated with an agent which will increase antigenicity, such agents are known 
in the art and include, for example, methylated bovine serum albumin (rnBSA), bovine serum 
albumin (BSA), Hepatitis B surface antigen, and keyhole hmpet hemocyanin (KLH). Serum from 
the immunized animal is collected, treated and tested according to known procedures. If the serum 

10 contains polyclonal antibodies to undesired epitopes, the polyclonal antibodies can be purified by 
immunoaffinity chromatography . 

Effective polyclonal antibody production is affected by many factors related both to the 
antigen and the host species. Also, host animals vary in response to site of inoculations and dose, 
with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng 

1 5 level) of antigen administered at multiple intradermal sites appears to be most reliable. Techniques 
for producing and processing polyclonal antisera are known in the art, see for example, Mayer and 
Walker (1987). An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. 
(1971). 

Booster injections can be given at regular intervals, and antiserum harvested when antibody titer 
20 thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against 

known concentrations of the antigen, begins to fall. See, for example, Ouchterlony, O. et al., (1973). 

Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 uM). 

Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as 

described, for example, by Fisher, D., (1980). 
25 Antibody preparations prepared according to either the monoclonal or the polyclonal protocol 

are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances 

in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of 

antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing 

cells expressing the protein or reducing the levels of the protein in the body. 

30 

Wliile the preferred embodiment of the invention has been illustrated and described, it will 
be appreciated that various changes can be made therein by the one skilled in the art without 
departing from the spirit and scope of the invention. 
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SEQUENCE LISTING FREE TEXT 

The following free text appears in the accompanying Sequence Listing : 
20 Homology with sequence in ref 

Polymorphic base insertion of 
Complement 

Diverging amino acid in ref 
Artificial sequence 
25 Sequencing oligonucleotide primer 
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<110> Genset SA 

<12 0> A nucleic acid encoding a geranyl-geranyl-pyrophosphate synthase 
(GGPPS) and polymorphic markers associated with said nucleic acid. 

<130> D. 18362 

<150> US 60/093,940 
<151> 1998-07-23 

<160> 11 

<17 0> Patent. pm 


<210> 1 

<211> 17131 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> exon 
<222> 486. .546 
<223> exon 1 

<220> 

<221> exon 
<222> 633. . 826 
<223> exon Ibis 

<220> 

<221> exon 
<222> 7292* .7384 
<223> exon 2 

<220> 

<221> exon 

<222> 13760. .13830 

<223> exon 3 


<220> 

<221> exon 
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<222> 14063 . .15251 
<22 3> exon 4 

<220> 

<221> misc_feature 
<222> 486. .546 
<223> homology with sequence in ref embl : AA398854 

<220> 

< 2 2 1 > mi s c_f e a t ur e 
<222> 7292 . . 7384 

<223> homology with sequence in ref embl : AA398854 
<220> 

<221> misc_f eature 
<222> 13760 . . 13830 

<223> homology with sequence in ref embl : AA3 98854 
<220> 

<221> mi sc_f eature 
<222> 14063 . .14314 

<223> homology with sequence in ref embl : AA398854 
<220> 

<221> misc__f eature 
<222> 633 . .826 

<223> homology with sequence in ref embl : Z445 96 
<220> 

<221> misc_f eature 
<222> 7292. .7384 

<2 23> homology with sequence in ref embl : 244 596 
<220> 

<221> misc_f eature 
<222> 13760 . . 13830 

<22 3> homology with sequence in ref embl : Z44596 
<220> 

<221> misc__f eature 
<222> 14243 . . 14670 

<223> homology with sequence in ref embl : AA4 3 585 8 
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<220> 

<221> misc_feature 
<222> 15055. .15251 

<22 3> homology with sequence in ref embl : AA19460 0 
<220> 

<221> misc_binding 
<222> 14036. .14081 
<223> 5-187-77 

<220> 

<221> allele 
<222> 14058 

<223> 5-187-77 polymorphic base insertion of T 
<220> 

<221> primer_bind 
<222> 13982 . . 14000 
<223> 5-187. pu 

<220> 

<221> prime r_bind 

<222> 14390 . .14409 

<223> 5-187. rp complement 

<220> 

<221> misc_feature 

<222> 184 7 . .1848,613 0, 6145, 10814, 12 943, 13125, 14874 . .14875, 14917 
15085. .15086 


<223> n=a, 

g, c or t 






<400> 1 







tcgggctccc 

tggttggggg 

gagggggacg 

acgaaaaatc 

ccccccggac 

tggaggtccg 

60 

ggcccccaat 

cgcgctgccc 

tccagaggac 

ggcggcgatg 

gaccctctgc 

agctccctcc 

120 

gggcaaaggt 

ccaggcggtg 

gccgtggcgg 

cggcaagatg 

aagctcaaga 

gtctccctcc 

180 

gcttcggcga 

ccgagctcct 

cactccggac 

tcgactgacg 

ggcaaacatc 

gcttcccccc 

240 

caccgactct 

aggttccccc 

cctttctccc 

ctcccctaga 

ttttttttcc 

ccccctcccc 

300 

tacctctttc 

ccggatggcc 

tcttagacga 

ccttggattg 

gttaaagttc 

tttagaaccc 

360 

gcctatacac 

tgttcctatt 

ggtccctgga 

tacaaacaac 

gacgccattt 

tcccaccagt 

420 

tctatggaaa 

cagaaagtta 

cgcctcaagg 

ctttctggga 

aataaagtcc 

atactctggg 

480 

gccaacgcgc 

aaatcctcgt 

ccgcgagaac 

tgcaaggccc 

gcaatgccct 

gcgcctgcgt 

540 
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ggaccggtgc 

qggqqcqqqq 

qqqaqqtqaa 

aqqqqcqqqq 

caacaaagca 

y i-ayyyaggc 

60 0 

ggcaacgacg 

cctgcgcagt 

gtgaccggga 

tggcgcattt 

tcttgcacca 

l. d d uy t*yy 

660 

tgtcgctggc 

ggctqaqgaq 

ggcgqagaqt 

ZJ ZJ ZJ Z> ZJ ZJ 

tctgtggtga 

aatagtggga 

dy y d L. ULctLy 

72 0 

taggcatcgg 

gaagagecta 

agtccacatt 

ataaaatagg 

aagttgatgc 

yyyy uat-ay t- 

7 80 

tactcccgga 

ccqqcqqcqt 

gaaagtcgtg 

atatcatcgt 

tgaactgtga 

gcggcagtgg 

UA A 
0*tU 

^'zjzj^'zjzj^ ^~z3z) 

ggggaacccg 

gatgggaaga 


aqqctqqqaq 

geggggcaga 


ggaaagaaag 

aaaggagagt 

gaggaccegg 

atgetgaace 

qqattqtqta 

tgaatt ttcc 

you 

atcccctagc 

tt taagegag 

qaqqqaaaqq 

aaqggt tqqc 

caagtggggc 

ggaagggagc 

lU<iU 

atctqaqcqa 

qqagcfa ao c a 

gaaacctcac 

cgtttcttcc 

cctccggact 

Cy L.y C LdyC 

1 ADA 

1UOU 

actgtatacg 

tttgcagttc 

tctgcccagc 

cqctqtqqaa 

aatcggcctc 

y ddy Uydtty 

ii/tn 
-L X^t U 

aaattccctg 

tttatatcag 

geggett ctt 

tcagatccat 

cgtctttctc 

ccggagtatg 

1200 


tt cagtatgc 

acttcacatt 

tgtatgtctc 

tggecat tct 

caaaccaggc 

1260 

ccttcccttt 

gaaaagt ctt 

ttgcatggga 

tgttcacttc 

ttagacgcaa 

ggttgtgtgc 

132 0 

r* c t aa t t t c a 

t cert" rtaa pa 

fCit" t a era snn 
wy l» i*ciy ciciy y 

cant tt cat t 

t c 1 1 c a t aaa 

tgttgagcgc 

1380 



y l. i>_- y iwCiy 

1^ y ^— y v.- <»- 

yy oyay wuyy 

acagatgetg 

144 0 


t* at" a a a a t* t* a 

^■<-y trfCty L.yy l. 


dy y dy dy lui. 

caeggtgata 

1500 

aaa csp\ a 1* aa cr 

yy aa cui-yyt 

1~tpttt~aaaa 
tuuuti. y y y 

anaaa a a aaa 
yyy "^**yy ^* 

a aaa 1 1 1 r* t a 

agcaagtgag 

1560 

aatcaaacta 

a era a p t aa a a 
yy y 

aa rfaara aa 
yy<~ ucty -—ayy 

aattaactaa 

aaaaaaaoaa 

aaggaaaaga 

1620 

pa t* t* r*r*acra c 

1»>CL L« L_» Imp I_pGLU Gilt— 


apt tat* paaa 

aaornrr'tata 

a dy w 1^ U- y l> y 

etc aaa a a nna 
y-yy^yyy"- 

gcttttccaa 

1680 

t*aj* cta a fia a r* 

t*aa ar*r*1~ aaa 

CI y *^ L^UUCL 

ya^auy^tjau 

yciyyyyydy u 

at paa arrtf 

ttaggctttg 

1740 

taaaggagt t 

ttaattttct 

rrt"33ha apa 

atgggatat c 

t t ccaaggaa 

tctcaatcaa 

1800 

aagggagaga 

^y y w V—* *w y ci i*~ 

t nrfa a t* at ra 
cyy cl ci um LvQ 

t r* rr t aar t a 

aagagtnnag 

gaagcgaaaa 

1860 

a a 25 of a a era cr t* 

t*aaa rra ("rCTf^* 
i— udciyciyy 'uCi 

cici L,y dy y y a 

ciwuwy o>»yciy 

y c»y y^uau l. y 

ccgtagtagt 

192 0 

t cacatggtg 

a aa acta a trier 

y y " ^- y ^- d 

t taataatta 

taaa M*r , arf 

ctttgaacaa 

1980 

a t" t 1" c i~ aap a 

y O L L L L L Qy L 

t"t*t"aaaaat*a 
i— <— i— y aady L-y 

dy ddy lul^u 

aar*+*r*t*r*ar*t* 
y d i *w v_ ^ 0. 

gaggtattct 

204 0 

gtagfcttttt 

cactctaaaa 

yy cv-CLC^^ Lp dy w 

agagt tcatg 

taacacacac 

taatgectet 

2100 

ttacatttaa 

ct ttagtatg 

tgatagctga 

aatttccagc 

tgtgat aaat 

tgggaaatcc 

2160 

tttgatttaa 

aagaaaaaca 



tatatgecac 

ggtgtgtaga 

2220 

atcctt taga 

ctcttaagaa 

gacacaaggc 


ggtggctcac 

gcttgtaatc 

2280 

ccagcacttt 


yy^yyy^yyct 

tcacgaggtc 

aggagatcga 

gaccatcctg 

2340 

getaacaegg 

tgaaagcccg 

tctctactaa 

aaatacaaaa 

aaat tagecg 

ggcaaggtgg 

24 00 

ccrcrcrccrc eta 

tagtcccagc 

tactcaoaaa 
^ ci^- L -^-yyyo.y 

octaaoacaa 
y ^* zj -3 3/ y 

gagaatggcg 

tgaacccggg 

2460 

aqacaaaq t t 

tgcagtgaga 

cgagatcacg 

ccactgcact 

ccagcctggg 

cyaCdgayLy 

252 0 

agacgctgtt 

tcagaagaaa 

gacacaaggc 

aagttggttg 

tcgatacctg 

gaaaaattga 

25 80 

agttcttatg 

ttt tcatacc 

actaaaaata 
t* w u 

rttctatata 

aat at cctc t 

999 aca 99 aa 

2 64 0 

attgacttaa 

gtgagt at tc 

1 1 aaara t r*t 

r*t*a aat a aaa 

c* c* y y c* c*. i«ci 

ttttttaaag 

2700 

cataattagt 

gttttaagtt 

gaaaaataac 

atcaaccaca 

aagctctacg 

aattgaaaca 

2760 

aagattagct 

ctgatttctg 

tgcaacaggg 

tacacctgtt 

acaggtcctg 

acacaaaagg 

2820 

gaattctgaa 

agtgcatctc 

attgattttt 

aagttcggtc 

aaatgtgttt 

tggaggctgt 

2880 

gagaaaatat 

acaaacgtga 

ttcttgctcc 

caacttgtag 

ttgagaaaag 

atagatacta 

2940 

acatttaaat 

agagaagtat 

atgagatcct 

tttttaattc 

tacttttaat 

gatgttcgat 

3000 

aataatcttt 

tagctaagee 

attattcttc 

ctgttttgca 

tcttcttttc 

ttacttcaat 

3060 
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ccctgataat aaggtcacgt gtcagagatc aaatagtata ggfcaataggt tacctaaata 3120 

ggtatttgca taataggtta cctaactaaa taggtttttg cctaataggt atgttgatta 3180 

tttcgcttac ttgattcttt atgagccttt ttttccttgc gacgtctttg gtattaattg 3240 

ttagtcaaga tggatgtaga aattttccat atgggatgtt tctctttgaa ttcatgttgt 33 00 

taaaatgatt tcttttggtg gagtgctgat cttttttatg attgtttcat atagataaga 3360 

acagactaca aaaaaatatg cctttcaatc ctgaagagta acctgaacta tacactagtt 3420 

ttgtgcttta attttcattt gtaatctgcc ttcaataaag agttaagcta gtggaattta 34 80 

tgtcttagct tgttataaca caaacacgaa tatttgtctg cttggcatta aagggtaaag 3 540 

atattccata gctgggaatc ttaatctgag gtacgtgtaa acattcaggg actatatgat 3600 

ctctgagaat ttgtatgttg taagtctttg tggcagtgta tacatttgtg ttgcaactta 3660 

ttaacacata caccgggctt tttttttttt ttttagaaga ttcatagctt tcatcatatt 3720 

ctcaaaaggt ttctgtgacc catgagatgg tttacagtat ggggaagcat caaagcactt 3780 

gcacagttga tggttatatg tgtgtgttat tatttcagcc acccattatc atgtgcttac 3840 

caactgccta acagtgcata catatgtaga agttttattc ttttctcctg ttgccatatt 3900 

atacgtctca tttcacagca gaaaaacaac tgcatgacag agacaatgtg gttcaaacca 3560 

ttttaccctt gtattcattg actgctacaa aacaggaaca ttaaatacct gattgtcacc 4020 

aaattgggta gtctcagcac ttctacactc gtaattgtgc tggaaaagtg gaatgctagc 4080 

actaataatt agattttggt ttggagggtt ttttatttgt ttattcttac ttgtataaat 4140 

ttatggggtg caagtgtagt tttatcacat gcatagattg cattgtagtg aagtcaggac 4200 

ttttaggggg tccatcaccc atgtaatcac gttgtaccca ttaagtaatc tttcatcatc 4260 

cacctccttc ccaccttctc accctttgga atctccattg tctatcattc cacactccat 4320 

gtccatgtat acacattatc tagctcccat ttataattga gaagatgtac tatttgtctt 4380 

ttatgtctga cttgttacac ttaaggtaag ggctatccat ccattttgct gcaaatgaca 4440 

tgatttcatt ttgttttaat ggctgagtaa tcattcgttg tatatatacc acattttctt 4500 

tattcagtca tctgctgatg gacacttagg ttgattccat atctttacta ttgtgaatag 4560 

tgctgtaata aacacatagt gcaagatttt ggaaatttta cttttgtggc acgttgttgg 4620 

tatttactca ggatctttgg atttgcttgg ctgcatgtat atgaatcagt gtgtttattt 4680 

actgaaatat gtgcaaaagt cttgtctttg gtggattaat ttataatata aatccacaaa 4740 

agtcagattc tgctcctaag tatattttac atttttaaat ttaatgccag caagaagtta 4800 

cagtactaga attgccttac ccctgagagt atcaatgatc agatcatagt atcaggtgac 4 860 

tgggctatag aagatgactt ttattactta acattatgaa gttactaggg ctgatttaga 4920 

aatcgaggaa cactggtgaa accccgtctc tactaaaata caaaaattag ctgggcgtgg 4 980 

tggtgggcac ctgtagtccc agctactcag aaggctgagt caggagaatt gcttgagccc 5040 

aggaggcaga ggttgcagtg agccgagatc gtgccactgc actccagcct gggcgacaga 5100 

gtgagactcc gtctcaaaaa aaaaaaaaaa aaaaaaaaag gaacacatcc tcactgttac 5160 

aataaataac agtagcccac acccccttag ttgtgatgtg gtgtgatacc atgtaagcaa 5220 

cctatttcca gttcccctaa cattctcaag cagctgtatc agaatcatac aagatgcata 52 80 

tttaaattga agatttctaa gtctctggcc cagacttaga aaaaaaggat caggccgggc 5340 

acagtagcta acacctgcaa ttccaacact ttgggaggct gaggcgggtg gatcgcctga 5400 

ggtcaggagt tttgagacca gcctggccaa catagtgaaa ccccatctct actaaaaatt 5460 

caaaaaatta gctgggcgtg gtggcaagaa cctgtaatcc ctgctattcg ggaggctgag 5520 

gcaggggaat cacttgaacc cgggaggtgg aggttgcagt gagccaagat tgcgccactg 5580 
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cactccagcc 

tgggcaacga 

gcaaaactcc 

gtctcaaaaa 

aaaaaaacaa 

d. d Clw ^— Lw- Iw 

564 0 

gagcaatcag 

aataacacaa 

agtacatgaa 

ctgaacttca 

ttttcttcat 

t caaaagaaa 

5700 

gtggccctca 

ctcaagcaaa 

tatattcttg 

tgetttatet 

tctggcatac 

tgagat aact 

5760 

ttctaaagtg 

gtttccaatt 

ccaaaatcca 

atgatgtgca 

actcattgaa 

caaccrtaar 
^ c *y^ , * v *^ «*.duv 

5820 

cacaaactgc 

cattagatgc 

catattacat 

ttagcctttt 

tgttgtagaa 

a a a t - 1* crcr t~ t* a 

5880 

gaagtgggct 

caggattcta 

aagactaaat 

catagtccca 

agaagcaaaa 

ydaaydyyaL, 

c Q/i n 

D J^t U 

aaaagtaata 

aacttcccaa 

aatgtgccaa 

agatgetaga 

gcagt tagat 

tCCtaauaLg 

DUUU 

aggacaagta 

ataatagaaa 

cagatacaaa 

gaaataaagt 

agagattcaa 

ay cac ay y y 

dUdU 

agaccctagg 

aagaccatga 

gtgttattct 

aggaaatact 

gaaat aagac 

agatttcagt 

612 0 

ataaaggggn 

aatatgttta 

ataanatata 

tgcatttgag 

t t aatgcgt a 

utctaaacca 

6180 

gaaatctctg 

aaatggat tg 

at tgtagaga 

aactactagg 

nana cci?i a era 

gaatcccttt 


aaattttaaa 

tacataaaac 

atactcatct 

tagtgetcat 

tt aaaaaagg 

a. La Cy LLLdC 


taattagtgt 

aa.tcagt.taa 

at ac agaggt 

atctttccaa 

ttctttaaat 

gegcut tgac 

^ *i £ A 

atttgccgtc 

aacaaattaa 

gect t ttgtg 

gttgattaaa 

at aggaaaag 

ct taat ataa 

6420 

gttatgtgac 

taagaaaaca 



ft" t* t*cfa rr*^ ^ 

tataatcact 

6480 

tciaataaacia 

attttctaat 

tgagat ataa 

tt t acat acc 

acccatt taa 

agtgtacat t 

£T ct a n 
ob4 O 

tcagcagtt t 

ttagtgtatt 

cacagggctg 

tgcaaccatc 

acaatt taat 

t t tataacat 

6600 

tttgatccct 

gcgaaaagaa 

accctgtact 

cattagcaat 

tagtccctgt 

tcctaaccac 

DOOU 

taatctactt 

tctttctctg 

tagafctggct 

tattctgaac 

at t tegtat a 

aatggaatca 

672 0 

tacaatatgt 

agtctcttga 

gattggcttc 

tttcacttaa 

catgttttca 

aggc t teat a 

6780 

gctgtagaat 

ctfcgctttgt 

ttttttgaga 

ctggagtcac 

tctttcgccc 

aggctggagt 

684 0 

gcagtggtgt 

gat ct cage t 

cac tgeaace 

tctgcctccc 

gggt t caagc 

agttctcctg 

6900 

cctcagcct c 

ccaagtagcc 

aaaactaraa 

gcacacacca 

\— V_- ^-j y 

ctaatctttg 

6960 

tagttttagt 

"yy^^yy *-y 


t_<— l_ CI. C* W ^— ■ 


gatctaccca 

7020 

cctcagctaa 

tttttcatat 

t t ttagtaga 

gacaaggt tt 


ccc aggc tgg 

7080 

tctcgaactc 

ctgggct taa 

gctatccgcc 

cgcctcagcc 

t cccaaagt g 

ctgggattac 

714 0 

aggcgtgaac 

taccgtgccc 

agcaacagaa 

tcttcttttt 

aaaccagact 

aggtgt cttt 


tcacaaacac 

cctgcaatac 

aaattccttt 

gcagtttgac 

actgaaagat 

y a. u Lay l. l. 

/«DU 

atgtgatctt 

tatgtttctc 

ctttttgaca 

gattagcttt 

gaagtttaaa 

t" nr 1 2 a /-*r~t ^ 

LLLaatyyay 

/ J^U 

aagactcaag 

aaacagtcca 

aagaattctt 

ctagaaccct 

ataaatactt 

"LUUt c*y L. L. a. 

/JOV 

ccaggtaata 

cttcacttac 

agtccatata 

gggtcatttt 

catgeagtag 

tggucgtt.ca 

"7 j4 rt 

aatgttagca 

aatagaaaag 

gttagacttg 

ctagccgttg 

acrattt tc ta 

u u uaaggega 

■7 c n n 

tqcqtatqaq 

aaaaatgata 

aatagaacat 

tataattttt 

tctttattaa 

a a rr/r-t- a a +• +• +• 
aa^^LaaLLC 


ttgccaggtq 

cagtgataca 

tacctgttgt 

cccacctact 

tqqqacracta 

aggcaggagg 

t <r o ri 

atqqcttqaq 

cccaggagtt 

taaggctata 

gtgcacaatg 

atcacacc tg 

tgaatageca 

7680 

ctacactcca 

gcttgggcaa 

cat agt gaga 

ccc ccs t ct* r* t* 

w W I* >— * W w 

UuuCLQ.ClMa.aCL 

cgtaattttt 

7740 

y aaggcaccc 

tttaaaacat 

atccaat tat 

4** X. _ A_ 

ttaacatatc 

ttgaaaaata 

aaaatactta 

7800 

aaacattttg 

gtatctcatt 

ggaggttgta 

etctttaegg 

atattacgea 

ttcagattcc 

7860 

ccactgttta 

gatattaggg 

gaagttacgc 

agatttgttt 

aacagtagaa 

cactttattt 

7920 

accatacatg 

ttcaagttta 

ccttctatgt 

ctgtattttc 

cagtatctca 

cacatacact 

7980 

gcatttcata 

tactactggt 

tcctttgaga 

gecaaataat 

aatgtatcta 

aaatcacagt 

8040 

atttggaaat 

atagcccact 

ttattcctgt 

ataagggtat 

gccaccttgg 

acatggcttc 

8100 
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ctacctcacg 

tgtacgtgtg 

tgtttttgtt 

ttattttgct 

tctttaaaaa 

cttgtctgga 

8160 

ggctgggcgt 

ggtggctcac 

gcctgtaatc 

ccagcacttt 

ttgaggccaa 

ggcgggcgga 

8220 

tcatgaggtc 

aagaggttga 

gaccagcctg 

gccaacatgg 

taaaaccccg 

tctctactaa 

8280 

aaacacaaaa 

gttagctggg 

catggtggcg 

catgcctgta 

gtcccagcta 

ctcgggaggc 

8340 

tgaggcagga 

gaatcacctg 

aacctggaag 

gcagaggttg 

cagtgagctg 

agattgcatc 

8400 

actgcactcc 

agcctggcaa 

cagaatgaga 

ctccgactca 

aaaaaaaaga 

agaacttgtc 

8460 

tggaaatgat 

aataagcaaa 

aactcatgaa 

tataataaac 

aggggttatt 

gtaataaaaa 

8520 

atcatttgta 

ttagaatatt 

ctttctcata 

gacataatat 

aggccaggtg 

tggtggccca 

8580 

cacctgtatt 

cccagcactt 

tgggaagcca 

aggcaggatt 

gcttgagacc 

aaaagtttga 

8640 

gaccaccttg 

ggcaacataa 

caagtccccc 

tctctgtttt 

aaacattttt 

taaaaaagaa 

8700 

gaaataatat 

aaaagttggt 

aaattatttg 

acaagcataa 

aaacctattt 

agccatactg 

8760 

tgactaaact 

ctaatgatgc 

tctcaattca 

gtctcaatag 

acacttttaa 

atttccgtgc 

8820 

taaagtacac 

acctttcttt 

atgagcactt 

ctctgtggta 

atatgtgcat 

ttctgttctt 

8880 

catgagcctg 

ggaaggataa 

aagccaaaag 

aatgcttgct 

cctgtgctac 

accttggaaa 

8940 

ccataattag 

tgtcattttt 

attttggccg 

accctaatag 

agactcgcct 

gctaatgtca 

9000 

atgcatgaga 

agaatgaggg 

aatgacagaa 

atggagaatt 

caaagggaag 

gttgcccact 

9060 

gtttaagaaa 

aagccaagag 

actgcttttg 

agtgacattt 

atccagcagt 

tagtaactta 

9120 

tttcagtatc 

tcccagtgag 

aaacatggca 

cagtttcact 

ttcactctac 

ccagctctta 

9180 

ctgccagaca 

tcctttagaa 

cacgctcaca 

aacactagct 

ggaactgggc 

tggcattaat 

9240 

agcaagccag 

ttatcagtgc 

tgacaaaagt 

ctaacaagca 

tcgcttgaat 

gtctcttact 

9300 

ctgctactta 

caaagcaagg 

actgcctaca 

gttacatttt 

aaccataatg 

cttacttatg 

9360 

ctgtgaccac 

cttctgtgac 

ttcctttttt 

ttaattctca 

ttacttggaa 

ataatgtttt 

9420 

aagacattag 

ataacatatt 

taaaattatc 

actaggtacc 

tcaccttttt 

attcaagtac 

9480 

gttcttgatc 

catgatggaa 

tacaacctca 

aaagatacta 

ctaaagaaat 

atgacattgc 

9540 

actatgcaca 

taacacactt 

atttttttac 

agagagcttc 

agagtfcacta 

aagtaactta 

9600 

gaggtgtgcc 

aggtcattta 

tactgttgta 

atattactct 

tgctaataaa 

taataataat 

9660 

gctatcagta 

ttttctgaag 

tcaacctggc 

caacatggtg 

aaaccctgta 

tctactaaaa 

9720 

atacaaatat 

tagccaagta 

tggtagcgca 

tgcctgtagt 

cccagctgag 

gcacgggagt 

9780 

cacaggagcc 

taggaggcag 

aggttgcagt 

gagccgagat 

cacgccactg 

cactccagcc 

9840 

tgggcaacag 

agtgagacac 

tgtctcaaaa 

aaaaaaaagg 

attttctgaa 

attagtaaag 

9900 

aaaattattt 

ttatttttaa 

atttctcata 

cttgctgtca 

tcttatgttt 

atgtttgttt 

9960 

atttgcctta 

gtgtggggcc 

ctagatgagg 

tgaagggtgg 

gattagggag 

agatgaagct 

10020 

ggcagtggag 

gaagaagggc 

tccaaaaaga 

gagacaataa 

tgtttagatc 

ttaaagagga 

10080 

agcagtaatc 

ttttaatttt 

gagagatctc 

tgtgattagc 

ctcagtacta 

gaaattattt 

10140 

tggaactcag 

ccaggcgcgg 

tggctcacat 

ctgtactccc 

agcactttgg gagaccgaag 

10200 

tgggcagatg 

gcttaagccc 

aggagttcaa 

gaccagcctg 

ggcaacatgg 

caaaaccctg 

10260 

tctctactaa 

aaatacaaaa 

aattagccag 

gcatgtgata 

cgcccttgta 

gtcccagctt 

10320 

acctggggga 

ctgaggtggg 

atgattaccg 

gagcctggga 

ggttgaggct 

gcagtaagcc 

10380 

aagatcacac 

cactgcaccc 

cagcctgggt 

gattaaggga 

gaccccgtct 

cagaaaaaaa 

10440 

aaaggggggg 

aaacttaaaa 

gcatcaggct 

aaacactagc 

atgtcatcag 

aggggaaaaa 

10500 

aatattaaaa 

ctgtagtacc 

tcaaaaataa 

gccatatatt 

gtactgtttt 

ctatataaca 

10560 

ttcaaaagta 

aaatgaaaaa 

tgaaatttca 

cattgagact 

ctgtttttca 

tcttcaaaaa 

10620 
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aatgtgttta 

agtgatacag 

gecaagtgea 

QtQQctcract 

tattatccca 

gcactttggg 

10680 

aggccaagtg 

ggacagattg 

ettttgagee 

cacrcfqcrtttq 

agaccagcct 

gggcaacagg 

10740 

gcgaaaccct 

gect ctacaa 

aaaataaata 

aataaaaat a 

aaattaarna 

ggcatggtgg 

10800 

cttgttcttg 

tagntcccag 

cfcactcaggg 

gact tgagee 

1 3 CTCT3 cro t~ C 3 

aggctgeagt 

10860 

aqqccqtaat 

tgtgccac tg 

cact ccagcc 

tqqqtqacaa 

^*333 

3 cxr*cf3 rT3 err* 

tgtctcaaaa 

10920 

ataataataa 

t a qcf c c crcrci c 

QtQQtqqcjt c 

acacctgtaa 

V#* V»* C*. ^-4 \^ CA ^ 

tegagaggee 

10980 

aaagcatgtg 

gacgacttga 

cj cf t c a q q a cr t 

t cgagaccag 

cr* t* crcr rr 3 3 r* 

atggggaaac 

11040 

cctgtctcta 

tt aaaagtac 

aaaaaat tgg 



atgtaatccc 

11100 

tacactttgg 


t crcr at aerate 

acctgaggt c 

3CJCI3 3 t" f" r*3 3 

a y3"" w^dd 

gaccagcctg 

11160 

gccaacatga 

tgaaaccgtc 

tctactaaaa 

at acaaaaaa 


ttagtggcgc 

11220 

acgactgtaa 

tcrraart*ar 
c* y v» <— - c< 

tcaQQacract 

craqq c aaaaa 


acctaggagg 

11280 

taoaocrt rar 


aatccrtcrar*3 

r-> f- rrf~ =1 r 1 rr* r* 3 
• L. y uct^^\^i_cl 

gcctgggcaa 

caagagcaaa 

11340 

actcgatctc 

a q3 3333333 

»y Cl C3- d C4. UUU a. 

t3r , 3A3333t" 

L.ciy t-ctyy t-.y 

uagtgacgca 

cacctgtaat 

11400 

w w w Ci \_ wCl^ U> 

I"*<*Tf"TfT 23 fm +- r~t 

'-•ggy^gyctg 

a era c 3 003 cr 3 
t *3 c * , ^* c *yy c *yci 

3 1* rr r* t* t~ 03 a 

cccaggaggc 

gaaggttgtg 

11460 

a t* cr 3 nr 1 ci 3 ft 

y i-y dy t,L-yciy 

LuaagdLCgC 


u t-cly C C t-cig 

gtgacagagc 

aaaacttcat 

11520 

r*t~ cm rsapp 

\-» l^CVk»ddClV_> 

ctct a. u a. a. a. C a. a 

ci L» d d ct ct. ct cl ct 


agcattttgg 

gaggecaaca 

11580 

^y etc* t_ l- 

ctc(_ Lydyyuc 


caccagcctg 

gecaacatag 

tgaaaccctg 

11640 

tctctactaa 


ci w cty s_> v_ ct.y y 

i_y Lyy tyyCa 

ggtgcctgta 

atcccagcta 

11700 

cttoqaacicrc 

1" cr 3 a a r* 3 n rr 3 
^•■y «y y v*ciy y ct 

craatccfcfctcr 

3 3 r" , f~ 1 f'3nr"Tf"7n 
uawUoauuyu 

gcggaggucg 

cagtgagccg 

11760 

agatcacacc 

d. iw i_y l. cn_ l. i^. i_ 

acr c c t a crcr t* cr 

3 P3 ansnrna 

ddU LLLdLLL 

ccaaaaaaaa 

11820 

aaaaagaaaa 

UCILLi L» L» d 

gttttaactt 

i~ t* t" 3 1" fit* 3 3 r* 

t-d LUt. tCCty 

aaaccttatc 

11880 

taaaat tagg 


c*3 t"crr*3 +*■ Ira 

i~ t~ +* 3 nr* artao 
U. L. L.cty k^ciy clcl 

aacccataga 

acatttttac 

11940 

taaataaact 

frnrra trier t~ t~ 

tttatctatc 


ct i„y UyclL-L.dC 

aatgacttct 

12000 


I- O i~ d t-dcici 

CT3 rr'hjpfr't- 1 - 

add L CagcCa 

ggcatggtgg 

cacatgegtg 

12060 

taatcccagc 

L-dw dy y dy 

y^t-yayyt-dg 

y dy ctd Ldyu c 

tgatct tggg 

a ggcggaggt 

12120 

tcjcaacrtciaci 

Cky L- ^ CX. 


i-^l ciy ^ k., L.y y 


gagactccgt 

12180 

ctcaaaaaca 

aaaaacaaaa 

agacctatct 

taaactttcc 

Clt"CI^33f7333 

aagatgatac 

12240 

tgttgggtga 

agtgactcaa 

cgtctgtaat 

ttcagcaatt 

tgggaggctg 

tageggcegg 

12300 

attgettgag 

cccaggagtt 

tgagaccagc 

ttgggcaaca 

tgggaacaca 

ctgtctctac 

12360 

aaaaacaaaa 

attaaceggg 

cgtggtcgct 

tgcacctata 

gtgecagcta 

ctegggagge 

12420 

tgaggtggag 

gctgcagtga 

gctgtgaaca 

caccactgca 

ctccagcctg 

ggtgacagag 

12480 

tgagaccctg 

tctcaaaaaa 

aaaagcaaga 

agcgcagtgg 

ctcacgcctg 

taatcccagc 

12540 

actttgggag gecgaggegg gcggatcacg aggtcaggag atcgagacca 

tcctggctaa 

12600 

cacggtgaaa 

ccccgtctct 

actaaaaata 

caaaaaatga 

geegggegtg gtagcgggcg 

12660 

cctgtagtcc 

cagctactcg 

ggaggctgag 

gcaggagaat 

ggcgtgaacc 

egggaggegg 

12720 

agettgeagt 

gagecgagat 

cgcgccactg 

cactccagcc 

tgggcgacag 

agegagaetc 

12780 

cgtctcaaaa 

aaaaaaaaaa 

aaaaaaaaac 

aagaaagaaa 

aaaagaagat 

actgaaaaat 

12840 

agatgtccct 

agtcaaaata 

atgagattag 

cttttgacta 

aactcaggat 

attaaaaggg 

12900 

aatacttcag 

tgcatgatga 

tctcattttt 

gaaaggaaag 

aancagagct 

tccccatctc 

12960 

taaaacctta 

attcaaagga 

gaaatagata 

atttcaagag 

gtatttttat 

gaggtaatag 

13020 

taaaatatat 

tttattaaca 

gtacctatag 

ttatgtaaaa 

taggtagtgc 

caattaactg 

13080 

acactaaact 

agcttcttgg 

cctggcgcag 

tggctcacgc 

ctgtnaatcc 

aaacactttg 

13140 
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ggaggccgat gcgggtgtat cgcttgggct caggaattca aggccagcct gggcaacata 13200 
ttaaaacccc ctttctataa aatatacaaa aattagccag gcatggtgtg tgcctgtagt 13260 
cccagatact caggaggctg aggcacgaga atcatgtgaa cccaggaggt ggagtttgca 13320 
gtgagccgag atcacgccac tgcactccag cctgggcaac agagcaaaac tctgtctcaa 133 80 
ataattaata aataaactag cttccttttc aaaaaaagaa ataaattagg tcctaagtcc 13440 
taaaagccca tcctacttta aaattgttta ttcaagttca gatgaaaaga gtggactagt 13500 
aggcaactga agtgctttag agtctcccgt gcctgcccta attttagaag gttgtgcact 13560 
ttatgatcca gatttctgag tggttgagaa tgagttattg agcagtgcaa ggcaagctct 1362 0 
gcagtaggta atggattgat gaggctggat ttagcaagtc tgatcaatct aaaggaagtt 13680 
tctgaatgtg ttttttgtag ttaaaatact cataattaaa acacttatca cattgtcaca 13740 
ttttattttt aaattgcagg taaacaagtg agaaccaaac tttcacaggc atttaatcat 13 800 
tggctgaaag ttccagagga caagctacag gtattaggca actctaacct cattaatccc 13 860 
caagaaatta atagctgtcg cataaaaata ttcctagttc ttgattgaat ttagtcctca 13920 
tgcaagatat tattttatat tgaggttgct aaatatttat tagttgtgaa aattaacaca 13980 
cctgagactt tcataatctg ttaattaaac tgagtaagtt ttgaatagtt caaataagtg 14040 
aaattttcaa tttttttatt agattattat tgaagtgaca gaaatgttgc ataatgccag 14100 
tttactcatc gatgatattg aagacaactc aaaactccga cgtggctttc cagtggccca 14160 
cagcatctat ggaatcccat ctgtcatcaa ttctgccaat tacgtgtatt tccttggctt 14220 
ggagaaagtc ttaacccttg atcacccaga tgcagtgaag ctttttaccc gccagctttt 14280 
ggaactccat cagggacaag gcctagatat ttactggagg gataattaca cttgtcccac 14340 
tgaagaagaa tataaagcta tggtgctgca gaaaacaggt ggactgtttg gattagcagt 14400 
aggtctcatg cagttgttct ctgattacaa agaagattta aaaccgctac ttaatacact 14460 
tgggctcttt ttccaaatta gggatgatta tgctaatcta cactccaaag aatatagtga 14520 
aaacaaaagt ttttgtgaag atctgacaga gggaaagttc tcatttccta ctattcatgc 14580 
tatttggtca aggcctgaaa gcacccaggt gcagaatatc ttgcgccaga gaacagaaaa 14640 
catagatata aaaaaatact gtgtacatta tcttgaggat gtaggttctt ttgaatacac 14700 
tcgtaatacc cttaaagagc ttgaagctaa agcctataaa cagattgatg cacgtggtgg 14760 
gaaccctgag ctagtagcct tagtaaaaca cttaagtaag atgttcaaag aagaaaatga 14820 
ataatgttaa gccattcttg attggacctc atagcttatt ttagttaatc tttnntttgt 14880 
cttttagcct taccaccttt taaaaaattt gttattntcc agaaacagta aataggtgag 14940 
taggggtggt gcaagtgaat tcgttttcat ttagaagccc ctctgtacag ataatcaaaa 15000 
ttcaaagttg aaagaatcaa aagcagccac agttatgtag gtctgatttg aatgtcataa 15060 
ttgcagtgac aggacattgc caccnnctcg tatcctacta ccatcaatgt tgtgtttatt 15120 
ccgtcaataa aaaagacttg cttccaggaa tttttatcca tacactttct aactgtacta 151B0 
tctgggcagt tccaagccag tttctattag ctagctggac caaagaccac aaatctcttt 1524 0 
ttttcctaaa cgctgctgta aggaatatct cacttttccc cccggaaaca ccctcactga 15300 
agtcttctat gaaaaggcct gataatgggc tgggcgcggt ggctcacgcc tgtaatccca 153 60 
gcactttggg aggccgaggc gggcagatca cgaggtcagg agatcgagac catcctgaca 1542 0 
cggtgaaacc ctgtctctac taaaaataca aaaaattagc tgggcgtggt ggtgggcgcc 15480 
tgtagtccca gctactcggg aggctgaggc aggagaatgg tgtgaaccca ggaggcggag 15540 
cttgcagtga gccgagatag tgcctctgca ctccagcctg ggtgacagag cgagactccg 15600 
tctcaaaaaa aagggctgat aatgataaac agtgagcact ccggtccttfc ttcttaggtt 15660 
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ttcctttttt 

ccttcctctc 

caccccacaa 

gttttgcttt 

ttaaccaagg 

tgtctctget 

1572 0 

tgatgaaatt 

cacatgctag 

tctaaatctt 

tttttctccc 

ttgtaacatt 

t~ a t" a\~ ci c r* c 

157 8 0 

aaactggtta 

qtatatqcrqt 

acagcattcc 

ct ttccaatt 

y y y cj. 

aaaagagagt 

15840 

atgggatatt 

ttagaaggga 

gectttgaac 

cttafctatat 

tfcccccatca 

L-L.yciL.ctyL-yci 


caatcttaaa 

a craa t t a 1 1 1 

tcttacctta 

Oy V- Cl^ClClCl.Ciy 

l* ct uyycict-ctctci 

t~ cirri r*t" t~ t" t* c 


cttcccgccc 

acatcaccac 

rrrn^ r*fe fe rra 

ay a^cty L. cty y 

L.yL,L,L-yciciL-y 

gaaagtgagt 

t c n o n 
ibUz U 

aaocatctt t" 

cia u. y i~ is_ »- y 

cl L. Laaoyyda 

eiy eg i_ Ldgcc 

ugagagggee 

tgactgaaaa 

i c o r\ 

Cif'rir^CCr^PiPiCl 
y uuuo^cicicih 

ry^t" fe aa t* a fe r* 

ass /**a r* fe a a +* 

tayCttLCLa 

gtgect taac 

LLLydCCLyy 

1614 0 

ttaccagttt 

tctafcaafcfct 

r*fe a /"*a rrraa 

y di— uyaay 

LLdLCLyLyy 

L-ccdayayy l 

i conn 

J. 0 U 

aggaCaaaaa 

d. Cl CL Cl CL d CL Cl Cl 

Cl d a a CI. cty L- y 


t- CydtCLytL 

gacatcccaa 

16260 

A^foa a Ant" t* 


/*• f/" 3 at 

L» u UayaaaLd 


ggt t eta tag 

ca ugt-tactt 

1632 0 


u a u CL Let L a L 


aatcctcacc 

caagcattca 

acctaaatct 

163 80 

t tgaaaagtt 

Cfd7dt _ nr i t*f*r+" o 

yyy L y^ L y t - t - 


L. C Cadddudy 

LttdddLCuC 

ccattttaat 

1644 0 



a a a a ^~ ^t/*t 
cictcLL.L-eAL.yyL- 

J_ _ J_ X. _ _ 

uci u tig«.T_gt t- 

at agtatgga 

aagttgaact 

16500 

ttataaaccr 

atarttt-t- aa 

ctcLdy LctLL L- L 


acactgacta 

tagaaacaaa 

16560 

tt* aaaa fecit - c 

d ^ k_- I* L-Ci.Ct.y 

fea"t~aaaaa+-+* 
L-ctOctctactcat-L 

yuL L.eic±y Lay 

at. Luy lCCCl 

tgcctatcaa 

1662 0 

attaattttg 

crcctcrQtcrtt 

r*t*t- c a 1" t* a fe fe 

1— 1— Cl L L.CI L. L. 

L. Ct L. LUytt del 

LLLLaCCLty 

cctttgtcaa 

16680 

taacagaaat 

^ ^— ^»y ^ 

na al-t* rrri r-ta a 
yadLLyyydd 

L. L. L. tttLttt 

LL LLLLLyay 

aeggagttte 

16740 

ac tcttgttg 

C c c a. crcr c fe rr 

a ntrip a a fe r~fri 
cay y l. ct ct Lyy 

CyuyaLCLCd 

yCCCaCtgCd 

acctccacct 

16800 

cccgggttca 

agegattetc 

ctgcctcagc 

ctcctaagta 

gctgggatta 

cagatgectg 

16860 

ccatgttgcc 

tggctaattt 

tttttttttt 

ttttttttta 

agtagagatg 

gggtttcacc 

16920 

atgttggcca 

ggctggtgtt 

gaacttctga 

cctcaggtga 

tccagctgcc 

tcggcctccc 

16980 

aaagtactgg 

gattacaggc 

atgagccacc 

gcacccagcc 

aaattgggga 

cttttaacag 

17040 

tcattttacc 

tgtagaataa 

tcaaaactct 

tcacttgatc 

tgtagtcata 

gctattaaca 

17100 

cagaaaaatg 

aatgccagtt 

atgttgccat 

a 
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<210> 2 

<211> 1414 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 
<222> 85 . .987 

<220> 

<221> polyA_signal 
<222> 1289 . . 1294 
<223> AATAAA 

<220> 

<221> misc feature 
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<222> 1. .477 

<223> homology with sequence in ref embl : AA398854 


PCT/IB99/01353 


<220> 

<221> misc_feature 
<222> 406. .833 

<223> homology with sequence in ref embl : AA435858 
<220> 

<221> misc__f eature 
<222> 1218 . . 1414 

<223> homology with sequence in ref embl : AA194 600 
<220> 

<221> misc_feature 

<222> 1037. .1038,1080, 1248. .1249 
<223> n-a, g, c or t 

<400> 2 

cgcgcaaatc ctcgtccgcg agaactgcaa ggcccgcaat gccctgcgcc tgcgtggacc 60 
gattagcttt gaagtttaaa tcca atg gag aag act caa gaa aca gtc caa 111 

Met Glu Lys Thr Gin Glu Thr Val Gin 

1 5 

aga att ctt eta gaa ccc tat aaa tac tta ctt cag tta cca ggt aaa 159 

Arg He Leu Leu Glu Pro Tyr Lys Tyr Leu Leu Gin Leu Pro Gly Lys 

10 15 20 25 

caa gtg aga acc aaa ctt tea cag gca ttt aat cat tgg ctg aaa gtt 207 

Gin Val Arg Thr Lys Leu Ser Gin Ala Phe Asn His Trp Leu Lys Val 

30 35 40 

cca gag gac aag eta cag att att att gaa gtg aca gaa atg ttg cat 255 
Pro Glu Asp Lys Leu Gin He He He Glu Val Thr Glu Met Leu His 

45 50 55 

aat gec agt tta etc ate gat gat att gaa gac aac tea aaa etc cga 303 
Asn Ala Ser Leu Leu He Asp Asp He Glu Asp Asn Ser Lys Leu Arg 

60 65 70 

cgt ggc ttt cca gtg gec cac age ate tat gga ate cca tct gtc ate 351 
Arg Gly Phe Pro Val Ala His Ser He Tyr Gly He Pro Ser Val He 

75 80 85 

aat tct gec aat tac gtg tat ttc ctt ggc ttg gag aaa gtc tta acc 399 
Asn Ser Ala Asn Tyr Val Tyr Phe Leu Gly Leu Glu Lys Val Leu Thr 
90 95 100 105 

ctt gat cac cca gat gca gtg aag ctt ttt acc cgc cag ctt ttg gaa 447 
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Leu Asp His Pro Asp Ala Val Lys Leu Phe Thr Arg Gin Leu Leu Glu 

110 115 120 

etc cat cag gga caa ggc eta gat att tac tgg agg gat aat tac act 4 95 

Leu His Gin Gly Gin Gly Leu Asp lie Tyr Trp Arg Asp Asn Tyr Thr 

125 130 135 

tgt ccc act gaa gaa gaa tat aaa get atg gtg ctg cag aaa aca ggt 543 
Cys Pro Thr Glu Glu Glu Tyr Lys Ala Met Val Leu Gin Lys Thr Gly 

140 145 150 

gga ctg ttt gga tta gca gta ggt etc atg cag ttg ttc tct gat tac 591 
Gly Leu Phe Gly Leu Ala Val Gly Leu Met Gin Leu Phe Ser Asp Tyr 

155 160 165 

aaa gaa gat tta aaa ccg eta ctt aat aca ctt ggg etc ttt ttc caa 639 
Lys Glu Asp Leu Lys Pro Leu Leu Asn Thr Leu Gly Leu Phe Phe Gin 
170 175 180 185 

att agg gat gat tat get aat eta cac tec aaa gaa tat agt gaa aac 687 
lie Arg Asp Asp Tyr Ala Asn Leu His Ser Lys Glu Tyr Ser Glu Asn 

190 195 200 

aaa agt ttt tgt gaa gat ctg aca gag gga aag ttc tea ttt cct act 735 
Lys Ser Phe Cys Glu Asp Leu Thr Glu Gly Lys Phe Ser Phe Pro Thr 

205 210 215 

att cat get att tgg tea agg cct gaa age ace cag gtg cag aat ate 783 
lie His Ala lie Trp Ser Arg Pro Glu Ser Thr Gin Val Gin Asn lie 

220 225 230 

ttg cgc cag aga aca gaa aac ata gat ata aaa aaa tac tgt gta cat 831 
Leu Arg Gin Arg Thr Glu Asn lie Asp He Lys Lys Tyr Cys Val His 

235 240 245 

tat ctt gag gat gta ggt tct ttt gaa tac act cgt aat acc ctt aaa 879 
Tyr Leu Glu Asp Val Gly Ser Phe Glu Tyr Thr Arg Asn Thr Leu Lys 
250 255 260 265 

gag ctt gaa get aaa gec tat aaa cag att gat gca cgt ggt ggg aac 927 
Glu Leu Glu Ala Lys Ala Tyr Lys Gin He Asp Ala Arg Gly Gly Asn 

270 275 280 

cct gag eta gta gec tta gta aaa cac tta agt aag atg ttc aaa gaa 975 
Pro Glu Leu Val Ala Leu Val Lys His Leu Ser Lys Met Phe Lys Glu 

285 290 295 

gaa aat gaa taa tgttaagcca ttcttgattg gacctcatag cttattttag 1027 
Glu Asn Glu * 
300 

ttaatctttn ntttgtcttt tagccttacc accttttaaa aaatttgtta ttntccagaa 1087 
acagtaaata ggtgagtagg ggtggtgcaa gtgaattcgt tttcatttag aagcccctct 1147 
gtacagataa tcaaaattca aagttgaaag aatcaaaagc agecacagtt atgtaggtct 12 07 
gatttgaatg teataattge agtgacagga cattgccacc nnctegtate ctactaccat 1267 
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caatgttgtg tttattccgt caataaaaaa gacttgcttc caggaatttt tatccataca 1327 

ctttctaact gtactatctg ggcagttcca agccagtttc tattagctag ctggaccaaa 1387 

gaccacaaat ctcttttttt cctaaac 1414 


<210> 3 

<211> 1547 

<212> DNA 

<213> Homo sapiens 


<220> 
<221> CDS 
<222> 218. .1120 


<220> 

<221> polyA_signal 

<222> 1422 . .1427 

<223> AATAAA 


<220> 

<221> misc_feature 
<222> 1. .359 

<223> homology with sequence in ref embl : Z44596 


<220> 

<221> misc_f eature 

<222> 1170. .1171,1213,1381. .1382 
<223> n-a, g # c or t 


<400> 3 

gcgcattttc ttgcaccaac taatgcggtg tcgctggcgg ctgaggaggg cggagagttc 60 
tgtggtgaaa tagtgggaag gattcatgta ggcatcggga agagcctaag tccacattat 12 0 
aaaataggaa gttgatgcgg ggtacagtta ctcccggacc ggcggcgtga aagtcgtgat 180 
atcatcgttg aactattagc tttgaagttt aaatcca atg gag aag act caa gaa 235 

Met Glu Lys Thr Gin Glu 
1 5 

aca gtc caa aga atfc ctt eta gaa ccc tat aaa tac tta ctt cag tta 283 
Thr Val Gin Arg lie Leu Leu Glu Pro Tyr Lys Tyr Leu Leu Gin Leu 

10 15 20 

cca ggt aaa caa gtg aga acc aaa ctt tea cag gca ttt aat cat tgg 331 
Pro Gly Lys Gin Val Arg Thr Lys Leu Ser Gin Ala Phe Asn His Trp 

25 30 35 

ctg aaa gtt cca gag gac aag eta cag att att att gaa gtg aca gaa 379 
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Leu Lys Val 
40 

atg ttg cat 
Met Leu His 
55 

aaa etc cga 
Lys Leu Arg 

tct gtc ate 
Ser Val lie 

gtc tta acc 
Val Leu Thr 
105 

ctt ttg gaa 
Leu Leu Glu 
12 0 

aat tac act 
Asn Tyr Thr 
135 

aaa aca ggt 
Lys Thr Gly 

tct gat tac 
Ser Asp Tyr 

ttt ttc caa 
Phe Phe Gin 
185 

agt gaa aac 
Ser Glu Asn 

200 
ttt cct act 
Phe Pro Thr 
215 

cag aat ate 
Gin Asn lie 

tgt gta cat 
Cys Val His 

acc ctt aaa 


Pro Glu Asp Lys Leu 
45 

aat gec agt tta etc 
Asn Ala Ser Leu Leu 
60 

c 9t ggc ttt cca gtg 
Arg Gly Phe Pro Val 
75 

aat tct gec aat tac 
Asn Ser Ala Asn Tyr 
90 

ctt gat cac cca gat 
Leu Asp His Pro Asp 
110 

etc cat cag gga caa 
Leu His Gin Gly Gin 
125 

tgt ccc act gaa gaa 
Cys Pro Thr Glu Glu 
14 0 

gga ctg ttt gga tta 
Gly Leu Phe Gly Leu 
155 

aaa gaa gat tta aaa 
Lys Glu Asp Leu Lys 
170 

att agg gat gat tat 
lie Arg Asp Asp Tyr 
190 

aaa agt ttt tgt gaa 
Lys Ser Phe Cys Glu 
205 

att cat get att tgg 
lie His Ala He Trp 
220 

ttg cgc cag aga aca 
Leu Arg Gin Arg Thr 
235 

tat ctt gag gat gta 
Tyr Leu Glu Asp Val 
250 

gag etc gaa get aaa 


14 

Gin He He He Glu 
50 

ate gat gat att gaa 
He Asp Asp He Glu 
65 

gee cac age ate tat 
Ala His Ser He Tyr 
80 

gtg tat ttc ctt ggc 
Val Tyr Phe Leu Gly 
95 

gca gtg aag ctt ttt 
Ala Val Lys Leu Phe 
115 

ggc eta gat att tac 
Gly Leu Asp He Tyr 
130 

gaa tat aaa get atg 
Glu Tyr Lys Ala Met 
145 

gca gta ggt etc atg 
Ala Val Gly Leu Met 
160 

ccg eta ctt aat aca 
Pro Leu Leu Asn Thr 
175 

get aat eta cac tec 
Ala Asn Leu His Ser 
195 

gat ctg aca gag gga 
Asp Leu Thr Glu Gly 
210 

tea agg cct gaa age 
Ser Arg Pro Glu Ser 
225 

gaa aac ata gat ata 
Glu Asn He Asp He 
240 

ggt tct ttt gaa tac 
Gly Ser Phe Glu Tyr 
255 

gec tat aaa cag att 
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Val Thr Glu 

gac aac tea 42 7 

Asp Asn Ser 
70 

gga ate cca 475 
Gly He Pro 
85 

ttg gag aaa 523 

Leu Glu Lys 

100 

acc cgc cag 571 
Thr Arg Gin 

tgg agg gat 619 
Trp Arg Asp 

gtg ctg cag 667 
Val Leu Gin 
150 

cag ttg ttc 715 
Gin Leu Phe 
165 

ctt ggg etc 763 

Leu Gly Leu 

180 

aaa gaa tat 811 
Lys Glu Tyr 

aag ttc tea 859 
Lys Phe Ser 

acc cag gtg 907 
Thr Gin Val 
230 

aaa aaa tac 955 
Lys Lys Tyr 
245 

act cgt aat 1003 

Thr Arg Asn 

260 

gat gca cgt 1051 
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Thr Leu Lys Glu Leu Glu Ala Lys Ala Tyr Lys Gin lie Asp Ala Arg 



ggt ggg aac cct gag eta gta gec tta gta aaa cac tta agt aag atg 
Gly Gly Asn Pro Glu Leu Val Ala Leu Val Lys His Leu Ser Lys Met 


1099 


280 285 290 

ttc aaa gaa gaa aat gaa taa tgttaagcca ttcttgattg gacctcatag 
Phe Lys Glu Glu Asn Glu * 


1150 


295 


300 


cttattttag ttaatctttn ntttgtcttt tagccttacc accttttaaa aaatttgtta 1210 

ttntccagaa acagtaaata ggtgagtagg ggtggtgcaa gtgaattcgt tttcatttag 1270 

aagcccctct gtacagataa tcaaaattca aagttgaaag aatcaaaagc agecacagtt 133 0 

atgtaggtct gatttgaatg teataattge agtgacagga cattgccacc nnctegtate 1390 

ctactaccat caatgttgtg tttattccgt caataaaaaa gaettgette caggaatttt 1450 

tatccataca ctttctaact gtactatctg ggcagttcca agecagttte tattagctag 1510 

ctggaccaaa gaccacaaat ctcttttttt cctaaac 154 7 

<210> 4 

<211> 300 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> VARIANT 
<222> 204 

<223> diverging amino acid Leu in ref : GENESEQP R97565 
<220> 

<221> VARIANT 
<222> 205 

<22 3> diverging amino acid Gly in ref ; GENESEQP R97565 
<220> 

<221> VARIANT 
<222> 225 

<223> diverging amino acid Ser in ref : GENESEQP R97565 
<220> 

<221> VARIANT 
<222> 252 

<223> diverging amino acid Lys in ref : GENESEQP R97565 


<220> 
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<221> VARIANT 
<222> 257 

<223> diverging amino acid Gly in ref : GENESEQP R97565 
<220> 

<221> VARIANT 
<222> 295 

<223> diverging amino acid Ser in ref : GENESEQP R97565 


<400> 4 

Met Glu Lys Thr 
1 

Lys Tyr Leu Leu 
20 

Gin Ala Phe Asn 
35 

He He Glu Val 
50 

Asp He Glu Asp 
65 

Ser He Tyr Gly 

Phe Leu Gly Leu 
100 

Lys Leu Phe Thr 
115 

Asp He Tyr Trp 
130 

Lys Ala Met Val 
145 

Gly Leu Met Gin 

Leu Asn Thr Leu 
180 

Leu His Ser Lys 
195 

Thr Glu Gly Lys 
210 

Pro Glu Ser Thr 
225 

He Asp He Lys 


Gin Glu Thr Val 
5 

Gin Leu Pro Gly 

His Trp Leu Lys 
40 

Thr Glu Met Leu 
55 

Asn Ser Lys Leu 
70 

He Pro Ser Val 
85 

Glu Lys Val Leu 

Arg Gin Leu Leu 
120 

Arg Asp Asn Tyr 
135 

Leu Gin Lys Thr 
150 

Leu Phe Ser Asp 
165 

Gly Leu Phe Phe 

Glu Tyr Ser Glu 
200 

Phe Ser Phe Pro 
215 

Gin Val Gin Asn 
230 

Lys Tyr Cys Val 
245 


Gin Arg He Leu 
10 

Lys Gin Val Arg 
25 

Val Pro Glu Asp 

His Asn Ala Ser 
60 

Arg Arg Gly Phe 
75 

He Asn Ser Ala 
90 

Thr Leu Asp His 
105 

Glu Leu His Gin 

Thr Cys Pro Thr 
140 

Gly Gly Leu Phe 
155 

Tyr Lys Glu Asp 
170 

Gin He Arg Asp 
185 

Asn Lys Ser Phe 

Thr He His Ala 
220 

lie Leu Arg Gin 
235 

His Tyr Leu Glu 
250 


Leu Glu Pro Tyr 
15 

Thr Lys Leu Ser 
30 

Lys Leu Gin He 

45 

Leu Leu He Asp 

Pro Val Ala His 
80 

Asn Tyr Val Tyr 
95 

Pro Asp Ala Val 

110 

Gly Gin Gly Leu 
125 

Glu Glu Glu Tyr 

Gly Leu Ala Val 
160 

Leu Lys Pro Leu 
175 

Asp Tyr Ala Asn 
190 

Cys Glu Asp Leu 
205 

He Trp Ser Arg 

Arg Thr Glu Asn 
240 

Asp Val Gly ser 
255 
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Phe Glu Tyr Thr Arg Asn Thr 
260 

Lys Gin lie Asp Ala Arg Gly 
275 

Lys His Leu Ser Lys Met Phe 
290 295 

<210> 5 
<211> 49 
<212> DNA 

<213> Artificial sequence 
<400> 5 

aagtgaaatt ttcaattttt ttattagatt attattgaag tgacagaaa 

<210> 6 
<211> 50 
<212> DNA 

<213> Artificial sequence 
<400> 6 

aagtgaaatt ttcaattttt tttattagat tattattgaa gtgacagaaa 

<210> 7 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<400> 7 

tgaaattttc aattttttt 

<210> 8 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<400> 8 

ctgagacttt cataatctg 


17 

Leu Lys Glu Leu Glu Ala 
265 

Gly Asn Pro Glu Leu Val 
280 285 
Lys Glu Glu Asn Glu 
300 
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Lys Ala Tyr 
270 

Ala Leu Val 


<210> 9 
<211> 20 
<212> DNA 
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<213> Artificial sequence 


18 
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<400> 9 

atgagaccta ctgctaatcc 20 

<210> 10 

<211> 18 

<212> DNA 

<213> Artificial Sequence 


<220> 

<221> misc_binding 
<222> 1. .18 

<223> sequencing oligonucleotide PrimerPU 


<400> 10 

tgtaaaacga cggccagt 18 

<210> 11 
<211> 18 
<212> DNA 

<213> Artificial Sequence 


<220> 

<221> miscjbinding 
<222> 1. .18 

<22 3> sequencing oligonucleotide PrimerRP 


<400> 11 

caggaaacag ctatgacc 


18 
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What is claimed: 

1 . An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1 of the following nucleotide positions of SEQ ID No 1 : 1 -485, 547-632, 827- 

5 7291, 7385-13759, 13831-14062, 14671-15054, and 15252-17131. 

2. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of SEQ ID No 2 or the complements thereof, wherein said contiguous span 
comprises at least 1 of the nucleotide positions 834-1217 of SEQ ID No 2. 

10 

3. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of SEQ ID No 3 or the complements thereof, wherein said contiguous span 
comprises at least 1 of the nucleotide positions 967-1 35 1 of SEQ 3D No 2. 

15 4. An isolated, purified, or recombinant polynucleotide consisting essentially of a 

contiguous span of 8 to 50 nucleotides of anyone of SEQ ID Nos 1-3 or the complement thereof, 
wherein said span includes a hGGPPS-related biallelic marker in said sequence. 

5. A polynucleotide according to claim 4, wherein said /iGGPPS-related biallelic marker is 
20 the biallelic marker 5-1 87-77. 

6. A polynucleotide according to any one of claims 4 or 5, wherein said contiguous span is 
18 to 50 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 
polynucleotide. 

25 

7. A polynucleotide according to claim 6, wherein said polynucleotide consists of said 
contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at 
the center of said polynucleotide. 

30 8. A polynucleotide according to claim 6, wherein said polynucleotide consists essentially 

of a sequence selected from the group consisting of SEQ ID Nos 5 and 6 and the complementary 
sequences thereto. 

9. A polynucleotide according to any one of claims 1-5, wherein the 3' end of said 
35 contiguous span is present at the 3 r end of said polynucleotide. 
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10. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
sequence selected from the group consisting of SEQ ID Nos 8-9. 


1 1 . A polynucleotide according to any one of claims 4 or 5, wherein the 3' end of said 

5 contiguous span is located at the 3' end of said polynucleotide and said biallelic marker is present at 
the 3 r end of said polynucleotide. 

12. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of 8 to 50 nucleotides of anyone of SEQ ID Nos 1 -3 or the complement thereof, 

1 0 wherein the 3 ? end of said contiguous span is located at the 3 1 end of said polynucleotide, and 
wherein the 3* end of said polynucleotide is located within 20 nucleotides upstream of a hGGPPS- 
related biallelic marker in said sequence. 

13. A polynucleotide according to claim 12, wherein the 3 1 end of said polynucleotide is 
1 5 located 1 nucleotide upstream of said hGGPPS -related biallelic marker in said sequence. 

14. A polynucleotide according to claim 13, wherein said polynucleotide consists 
essentially of the sequence of SEQ ID No 7. 

20 1 5. An isolated, purified, or recombinant polynucleotide which encodes a polypeptide 

comprising a contiguous span of at least 6 amino acids of SEQ ID No 4, wherein said contiguous 
span includes at least one amino acid selected from the group consisting of a Phe residue at positions 
204, 257, 295 of SEQ ID No 4, a Cys residue at position 205 of SEQ ID No 4, a Pro residue at 
position 225 of SEQ ID No 4, and a Glu residue at position 252 of SEQ ID No 4. 

25 

16. A polynucleotide for use in a genotyping assay for determining the identity of the 
nucleotide at a 6 GGPPS-related biallelic marker or the complement thereof. 

17. A polynucleotide according to claim 16, wherein the polynucleotide is used in a 
30 hybridization assay. 

18. A polynucleotide according to claim 16, wherein the polynucleotide is used in a 
sequencing assay. 


35 19. A polynucleotide according to claim 16, wherein the polynucleotide is used in an 

enzyme-based mismatch detection assay. 
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20. A polynucleotide according to claim 16, wherein the polynucleotide is used in 
amplifying a segment of nucleotides comprising said biallelic marker. 


21. A polynucleotide according to any one of claims 1-20 attached to a solid support. 

5 

22. An array of polynucleotides comprising at least one polynucleotide according to claim 

21. 

23. An array according to claim 22, wherein said array is addressable. 

10 

24. A polynucleotide according to any one of claims 1 -20 further comprising a label. 

25. A recombinant vector comprising a polynucleotide according to any one of claims 1-20. 

15 26. A host cell comprising a recombinant vector according to claim 25. 

27. A non-human host animal or mammal comprising a recombinant vector according to 
claim 25. 

20 28. A method of genotyping comprising determining the identity of a nucleotide at a 

/jGGPPS-related biallelic marker or the complement thereof in a biological sample. 

29. A method according to claim 28, wherein said biological sample is derived from a 
single subject. 

25 

30. A method according to claim 29, wherein the identity of the nucleotides at said biallelic 
marker is determined for both copies of said biallelic marker present in said individual's genome. 

31. A method according to claim 28, wherein said biological sample is derived from 
30 multiple subjects. 

32. A method according to any one of claims 28, further comprising amplifying a portion of 
said sequence comprising the biallelic marker prior to said determining step. 


35 


33. A method according to claim 32, wherein said amplifying is performed by PCR. 
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34. A method according to any one of claims 28-33, wherein said determining is performed 
by a hybridization assay. 


35. A method according to any one of claims 28-33, wherein said determining is performed 
5 by a sequencing assay. 

36. A method according to any one of claims 28-33, wherein said determining is performed 
by a microsequencing assay. 

10 37. A method according to any one of claims 28-33, wherein said determining is performed 

by an enzyme-based mismatch detection assay. 

38. An isolated, purified, or recombinant polypeptide comprising a contiguous span of at 
least 6 amino acids of SEQ ID No 4, wherein said contiguous span includes at least one amino acid 

1 5 selected from the group consisting of a Phe residue at positions 204, 257, 295 of SEQ ID No 4, a 
Cys residue at position 205 of SEQ ID No 4, a Pro residue at position 225 of SEQ ID No 4, and a 
Glu residue at position 252 of SEQ ID No 4. 

39. An isolated or purified antibody composition are capable of selectively binding to an 
20 epitope-containing fragment of a polypeptide according to claim 38, wherein said epitope comprises 

at least one amino acid selected from the group consisting of a Phe residue at positions 204, 257, 295 
of SEQ ID No 4, a Cys residue at position 205 of SEQ ID No 4, a Pro residue at position 225 of 
SEQ ID No 4, and a Glu residue at position 252 of SEQ ID No 4. 

25 40. A method for the screening of a candidate substance or molecule modulating the 

expression of the hGGPS gene, said method comprising the following steps : 

a) providing a recombinant host cell expressing a nucleic acid, wherein said nucleic acid 
comprises a nucleotide sequence selected from the group consisting of SEQ ID Nos 1, 2 and 3 or a 
fragment thereof; 
30 b) obtaining a candidate substance, and 

c) determining the ability of the candidate substance to modulate the expression levels of the 
nucleotide sequence selected from the group consisting of SEQ ID Nos 1 , 2 and 3 or a fragment 
thereof 

35 41 . A method for the screening of a candidate substance or molecule modulating the 

expression of the hGGPS gene, said method comprising the following steps : 
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- providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises a nucleotide sequence of the 5* regulatory region or a biologically active fragment or 
variant thereof located upstream a polynucleotide encoding a detectable protein; 

- obtaining a candidate substance; and 

- determining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 



FIGURE 1 
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42. A method of making a hGGPPS polypeptide, said method comprising 

(a) providing a population of host cells comprising the nucleic acid of any of claims 1, 2 and 3; 
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(b) culturing said population of host cells under conditions conducive to the expression of said 
recombinant nucleic acid; 

whereby said polypeptide is produced within said population of host cells. 


Please amend the following claims: 

1 . A composition comprising an isolated, purified, or recombinant polynucleotide comprising [a 
contiguous span of at least 12 nucleotides] any one of: 
(a) the nucleotide sequence of SEQ ID No 1^ 
fb) the nucleotide sequence of SEQ ED No 2; 

(c) the nucleotide sequence of SEQ ID No 3; 

(d) a contiguous span of 8 to 50 nucleotides of any one of SEQ ID Nos 1, 2, or 3, wherein said 
span includes the hGGPPS-related biallelic marker designated 5-187-77; and 

£e) [or] the complements of (a), fb) fc) or fd) above [thereof , wherein said contiguous span 
comprises at least 1 of the following nucleotide positions of SEQ ID No 1: 1-485, 547-632, 827- 
7291, 7385-13759, 13831-14062, 14671-15054, and 15252-17131]. 

10. A composition comprising an isolated, purified, or recombinant polynucleotide consisting 
essentially of a sequence selected from the group consisting of SEQ ID Nos 5 to 9 and the 
complementary sequences thereto [8-9]. 

15. A composition comprising an isolated, purified, or recombinant polynucleotide which 
encodes a polypeptide comprising a contiguous span of at least 6 amino acids of SEQ ID No 4, 
wherein said contiguous span includes at least one amino acid selected from the group consisting 
of a Phe residue at positions 204, 257, 295 of SEQ ID No 4, a Cys residue at position 205 of SEQ 
ID No 4, a Pro residue at position 225 of SEQ ID No 4 [, and a Glu residue at position 252 of 
SEQ ID No 4]. 
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21 . The polynucleotide according to [any one of claims 1-20] claim 1 attached to a solid support. 

24. The: [a] polynucleotide according to [any one of claims 1-20] claim 1 further comprising a 
label. 

25. A recombinant vector comprising [a] the polynucleotide according to claim 1 [any one of 
claims 1-20]. 

€1 26. A composition comprising: a host cell recombinant for the polynucleotide of claim 1 
£ [comprising a recombinant vector according to claim 25] . 

W 27. A non-human host animal or mammal comprising [a] the recombinant vector according to 

3 claim 25. 

O 

CJ 28. A method of genotyping comprising determining the identity of a nucleotide at a hGGPPS- 

m related biallelic marker or the complement thereof in a biological sample, wherein said hGGPPS- 

^ related biallelic marker is 5-187-77 . 

32. The [A] method according to claim [any one of claims] 28, further comprising amplifying a 
portion of said sequence comprising the biallelic marker prior to said determining step. 

38. A composition comprising: an isolated, purified, or recombinant polypeptide comprising a 
contiguous span of at least 6 amino acids of SEQ ID No 4, wherein said contiguous span includes 
at least one amino acid selected from the group consisting of a Phe residue at positions 204, 257, 
295 of SEQ ID No 4, a Cys residue at position 205 of SEQ ID No 4, and a Pro residue at position 
225 of SEQ ID No 4 [, and a Glu residue at position 252 of SEQ ID No 4]. 

39. An isolated or purified antibody composition are capable of selectively binding to an 
epitope-containing fragment of a polypeptide according to claim 38, wherein said epitope 
comprises at least one amino acid selected from the group consisting of a Phe residue at positions 
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